XML Schema Description

An XML file contains the recognized text, with additional information about its structure, attributes and recognition variants described with the help of XML tags. See the table below for the description of possible tags. Some tags may not be present according to the values of the recognition parameters. For example, word or character recognition variants will only be saved if the corresponding properties of the XMLExportParams object are set to TRUE.

You can find the XML schema in the FineReader10-schema-v1.xsd file.

This file is located in the Inc folder (Start > Programs > ABBYY FineReader Engine 12 > Installation Folders > Include Files Folder).

The picture below shows the example of Picture, Text, and Table blocks in the output XML file.

Description of document tags

Name Type Multiplicity Parent Tag Description
document

Complex Type

Type elements

Type attributes

1 no Document.
page

Complex Type, a sequence of block tags

Type elements

Type attributes

0...unbounded document Recognized page.
block

BlockType

BlockType elements

BlockType attributes

0...unbounded page Recognized block.
region

Complex Type, a sequence of rect tags

Type elements

Has no type attributes

1 block Block region, a set of rectangles.
rect

Complex Type

Type attributes

1...unbounded region Rectangle of a block region.
text

TextType

TextType elements

TextType attributes

0...unbounded block Text of a recognized text block (presents as an element of block tag, if blockType attribute is "Text").
0...unbounded cell Text of a table cell.
par

ParagraphType

ParagraphType elements

ParagraphType attributes

0...unbounded text Paragraph of a recognized text.
line

LineType

LineType elements

LineType attributes

0...unbounded par Line of a paragraph.
formatting

FormattingType

FormattingType group

FormattingType attributes

0...unbounded line Group of characters with uniform formatting. Attributes of characters are alternated with word's recognition variants. The variants of recognition of the word are written before the word.
charParams

CharParamsType

CharParamsType elements

CharParamsType attributes

0...unbounded formatting Attributes of a single character.
charRecVariants

Complex Type, a sequence of charRecVariant tags

Type elements

Has no type attributes

charParams Variants of a character recognition.
charRecVariant

CharRecognitionVariant

Type attributes

0...unbounded charRecVariants Variant of a character recognition.
wordRecVariants

Complex Type, a sequence of wordRecVariant tags

Type elements

Has no type attributes

formatting Variants of recognition of the next word.
wordRecVariant

WordRecognitionVariant type

WordRecognitionVariant elements

WordRecognitionVariant attributes

0...unbounded wordRecVariants Variant of recognition of the next word.
variantText

Complex Type, a sequence of charParams tags

Type elements

Has no type attributes

1 wordRecVariant Word.
row

TableRowType

TableRowType elements

Has no type attributes

0...unbounded block Table row (presents if blockType attribute is Table).
cell

Complex Type, a sequence of TextType tags

Type elements

Type attributes

0...unbounded row Table cell (presents if blockType attribute is Table).
separatorsBox

Complex Type, a sequence of separator tags

Type elements

Has no type attributes

0...1 block Group of separators, presents if blockType attribute is "SeparatorsBox"
separator

SeparatorBlockType type

SeparatorBlockType elements

SeparatorBlockType attributes

0...1 block Single separator, presents if blockType attribute is "Separator".
0...unbounded separatorsBox Separator in a group of separators.
barcodeInfo

BarcodeInfoType type

BarcodeInfoType attributes

0...1 block Information about barcode, presents if blockType attribute is "Barcode".
start

Point type

Point attributes

1 separator Start point of a separator.
end

Point type

Point attributes

1 separator End point of a separator.
documentData

Complex Type

Type elements

Has no type attributes

0...1 document Parameters of paragraph and font styles of the document.
paragraphStyles

Complex Type, a sequence of paragraphStyle tags

Type elements

Has no type attributes

0...1 documentData Collection of paragraph formatting styles.
paragraphStyle

ParagraphStyleType Type

ParagraphStyleType elements

ParagraphStyleType attributes

0...unbounded paragraphStyles Formatting style of a paragraph.
fontStyle

FontStyleType Type

FontStyleType attributes

0...unbounded paragraphStyle The font style.
sections

Complex Type, a sequence of section tags

Type elements

Has no type attributes

0...1 documentData The collection of document sections.
section

SectionType Type

SectionType elements

Has no type attributes

0...unbounded sections A document section.
stream

TextStreamType Type

TextStreamType elements

TextStreamType attributes

0...unbounded section A sequence of paragraphs and blocks.
mainText

Complex Type

Type attributes

0...1 stream
elemId

Complex Type

Type attributes

0...unbounded stream The ID of a page element.

Tag hierarchy diagram

See also

XMLExportParams

24.03.2023 8:51:52

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.