When processing a document, ABBYY FineReader Engine first analyzes its layout and detects certain areas on the document pages. These areas are called "blocks." Blocks determine how and in what order the image areas should be recognized.
In ABBYY FineReader Engine, the Layout object serves as storage for blocks and recognized text. The basic document processing scenarios work with the layout within the FRDocument object, which represents the document being processed. To access the layout of a document page, use the IFRPage::Layout property.
Layout blocks
The Layout object provides access to the layout structure via the Blocks and BlackSeparators properties. Both these properties provide access to the LayoutBlocks subobjects, which represent collections of blocks. The first one refers to the main set of layout blocks, which contains texts, tables, pictures, barcodes, and checkmarks. The second one refers to the collection of blocks for separators. Separators are black lines that are detected during the page layout analysis. They are used for more precise page layout reconstruction during synthesis and export.
Also, you may get the blocks in a logically sorted order with the SortedBlocks property of the Layout object.
Each block has its region, which consists of several rectangles. A region is represented by the Region object.
Depending on the type of data contained in the block, blocks may be of different types, each with its own specific properties. These properties are accessible via the corresponding block type objects, which can be received using the methods of the Block object. The corresponding block type interfaces are derived from the IBlock interface and inherit all its properties. The following block types are available:
Text block
This block type corresponds to an image zone recognized as formatted text. Properties of this block are accessible via the TextBlock object. This object also provides access to the recognized text from the part of the image enclosed by the block.
Table block
This block type corresponds to a table. Blocks of this type may only be rectangular (the region contains only one rectangle). The properties of this block type are accessible via the TableBlock object. The structure of the table is described by two collections of table separators, horizontal and vertical (the TableSeparators objects), and by a collection of table cells (the TableCells object). Each table cell is treated as a block of some type. The recognized text is a property of a single cell, not of the entire table. If a cell is a picture, the image enclosed in the cell bounds is not recognized and is displayed as a picture in the recognized text. Table separators may be of different types. A separator type is defined for a segment of a separator that lies between its nearest intersections with other separators. Separators may be of the following types:
- Absent. This type is assigned to table separators that "should go" through merged cells.
- Unknown. This type is assigned by default to every newly added table separator.
- Invisible. This type is assigned to an "imaginary" table separator created as a result of table structure analysis in a place where the source table did not have one but where it should logically be.
- Explicit. Table separators of this type appear where the black lines of the source table are located.
- Multiple. This type of separator may appear as a result of table editing.
Raster picture block
This block type represents an image zone treated as a raster picture. The part of the image that this block encloses is not recognized, and the block is exported "as is." The properties of this block type are represented by the RasterPictureBlock object.
Vector picture block
This block type represents an image zone treated as a vector picture. Blocks of this type may appear in the layout only if a page has been analyzed with the IPageAnalysisParams::DetectVectorGraphics property set to TRUE. Usually, background pictures are recognized as blocks of this type. The properties of this block type are represented by the VectorPictureBlock object.
Barcode block
A part of the image enclosed by a block of this type is treated as a barcode. ABBYY FineReader Engine recognizes barcodes of several types and can also detect barcode types automatically. The information read from a recognized barcode is accessible via the barcode block specific properties represented by the BarcodeBlock object.
Checkmark block
A part of the image enclosed by a block of this type is treated as a checkmark. It corresponds to an image area recognized as a checkmark. The information read from a recognized checkmark is accessible via the checkmark block specific properties represented by the CheckmarkBlock object.
Checkmarks group block
A part of the image enclosed by a block of this type is treated as a checkmarks group. It corresponds to an image area recognized as the checkmarks group. The information read from a recognized checkmarks group is accessible via the checkmarks group block specific properties represented by the CheckmarkGroup object.
Separator block
A part of the image enclosed by a block of this type is treated as a separator. Separators are lines that are detected during the page layout analysis. They may be parts of a table, lines that separate different text elements, etc. The coordinates and type of a separator are accessible via the SeparatorBlock object.
Separators group block
A part of the image enclosed by a block of this type is treated as a separators group. It corresponds to an image zone recognized as a group of separators. A group of separators usually includes four separators which form a rectangle. For example, four lines of a table border are recognized as a separator group. Each separator group contains a collection of separator blocks. The specific properties of a separators group block are represented by the SeparatorGroup object.
Adding blocks manually
Blocks are found on a page automatically during layout analysis. But you may want to draw blocks manually. In this case:
- Open the FRPage object and obtain the page layout via the Layout property.
- Create a Region object for the block using the IEngine::CreateRegion method and add rectangles to it using the IRegion::AddRect method.
- Create a block of the required type and add it into the collection of layout blocks using the AddNew method of the LayoutBlocks object.
- Set the required parameters of the block (use the block properties object corresponding to the type of block).
Changing the block type
The block type is defined during creation and cannot be changed. If you need to change the block type, you will have to delete the block and create another block of the necessary type in exactly the same place:
- Create a Region object using the IEngine::CreateRegion method and copy the region of the block you need to replace with the help of the IRegion::CopyFrom method.
- Delete the old block from the layout by calling the ILayoutBlocks::DeleteAt method.
- Create a new block of the required type and add it into the collection of layout blocks using the AddNew method of the LayoutBlocks object. Pass the Region you copied from the old block as one of the required parameters.
See also
Recognizing Checkmarks
Working with Text