Document XML Scheme
When recognizing a page, ABBYY FineReader Server first analyzes its layout and detects blocks of various types on the page. Each block on the page belongs to one of the four types described below, and has its own sequence number and region (a region is a set of rectangles on the image positioned one under another in such a way that the top line of the lower rectangle is the bottom line of the upper one, so that the rectangles do not overlap). Blocks determine how and in what order the image areas should be recognized.
The following block types are supported:
Text - This is used for text image areas and should only contain single-column text. The recognized text is enclosed with text tags in the XML file. Text is represented as a set of paragraphs (each paragraph is enclosed with par tags). In a paragraph, each line is marked by line tags. For a line, formatting attributes are shown (formatting tags). Character attributes are represented in charParams tag attributes.
Table - This is used for table image areas or for areas of text that have the structure of a table. The recognized table is represented in the XML file1 by a set of rows (row tags). In a row, each cell is marked by cell tags. Cell text is enclosed with text tags.
Picture - This is used for image areas that contain pictures. This type of block may enclose an actual picture or any other object that should be displayed as a picture (e.g. a section of text). A picture block is only represented as a block region (region tags) in the XML file.
Barcode - This is used for barcode image areas. The recognized barcode is represented in the XML file by the barcode value (if the LookForBarcodes property of the RecognitionParams object is set to TRUE). The barcode value is enclosed with text tags.
For the XML scheme of an XML document, see the ExportToXml.xsd file, which can be found in the Help subfolder of the ABBYY FineReader Server installation folder (the default location is C:\Program Files\ABBYY FineReader Server 14.0\Help).
Note. When working with a page on the FineReader Server 14 Verification Station, blocks are shown as image areas enclosed in frames of different colors, as on the picture below.
The picture below shows Picture, Text, and Table blocks in the output XML file.
Description of Tags
See also
COM-based API: XMLExportSettings
Web Services API: XMLExportSettings
3/26/2024 1:49:49 PM