Working with Layout and Blocks
When processing a document, ABBYY FineReader Engine first analyzes its layout and detects certain areas on the document pages. These areas are called "blocks." Blocks determine how and in what order the image areas should be recognized.
In ABBYY FineReader Engine, the Layout object serves as storage for blocks and recognized text. The basic document processing scenarios work with the layout within the FRDocument object, which represents the document being processed. To access the layout of a document page, use the IFRPage::Layout property.
If you use the Engine object loaded as an out-of-process server, you may speed up the iteration of the Layout object with the following scheme:
- Use the Engine object loaded as an out-of-process server to process a document and get a layout for each page. Write the obtained layout as an array of bytes by the SaveToArray method.
- Use the Engine object loaded natively (i.e., loaded in the current process from libFREngine.so library) to read the obtained array of bytes with your implementation of the IReadStream interface.
- Use the Engine object loaded natively to restore a copy of the original layout with the CreateLayoutFromStream method. Use this copy for the further iteration of the layout contents.
Layout blocks
The Layout object provides access to the layout structure via the Blocks and BlackSeparators properties. Both these properties provide access to the LayoutBlocks subobjects, which represent collections of blocks. The first one refers to the main set of layout blocks, which contains texts, tables, pictures, barcodes, and checkmarks. The second one refers to the collection of blocks for separators. Separators are black lines that are detected during the page layout analysis. They are used for more precise page layout reconstruction during synthesis and export.
Also, you may get the blocks in a logically sorted order with the SortedBlocks property of the Layout object.
Each block has its region, which consists of several rectangles. A region is represented by the Region object.
Depending on the type of data contained in the block, blocks may be of different types, each with its own specific properties. These properties are accessible via the corresponding block type objects, which can be received using the methods of the Block object. The corresponding block type interfaces are derived from the IBlock interface and inherit all its properties. The following block types are available:
Text block
Table block
Raster picture block
Vector picture block
Barcode block
Checkmark block
Checkmarks group block
Separator block
Separators group block
Adding blocks manually
Blocks are found on a page automatically during layout analysis. But you may want to draw blocks manually. In this case:
- Open the FRPage object and obtain the page layout via the Layout property.
- Create a Region object for the block using the IEngine::CreateRegion method and add rectangles to it using the IRegion::AddRect method.
- Create a block of the required type and add it into the collection of layout blocks using the AddNew method of the LayoutBlocks object.
- Set the required parameters of the block (use the block properties object corresponding to the type of block).
Changing the block type
The block type is defined during creation and cannot be changed. If you need to change the block type, you will have to delete the block and create another block of the necessary type in exactly the same place:
- Create a Region object using the IEngine::CreateRegion method and copy the region of the block you need to replace with the help of the IRegion::CopyFrom method.
- Delete the old block from the layout by calling the ILayoutBlocks::DeleteAt method.
- Create a new block of the required type and add it into the collection of layout blocks using the AddNew method of the LayoutBlocks object. Pass the Region you copied from the old block as one of the required parameters.
See also
7/3/2024 8:50:25 AM