Document Analysis in Data Capture

Document analysis refers to the methods of automatically identifying components of a document. An important feature of the data capture scenario is that only certain fields are recognized. ABBYY FlexiCapture SDK imitates the way humans recognize objects. In order to detect the required data human, an operator is looking for fields on the document. He founds the field and analyzes areas around it. Our product does the same. It finds the required fields on flexible forms by using a special formalized description, called FlexiLayout™ – that created with a special visual tool – ABBYY FlexiLayout Studio. Then the program analyzes the area surrounding each element and make inferences about the nature of the fields and their content. The system can find fields else-where, using any information available: relation to other objects on the page, contents of the field, its size, lines drawn around, etc.

Developing with ABBYY FlexiCapture SDK usually is done in 2 steps:

  1. You should analyze the nature of documents to be used for data extraction and create proper Document Definitions that can be based on either Fixed Form Definitions or FlexiLayouts.
  2. After that, you can integrate the Engine into your application.

Development of Document Definitions for fixed forms

The Document Definition Editor (a part of ABBYY FlexiCapture 12) allows fast and intuitive development of Document Definitions to process static, fixed forms.

  1. Load the different segments of the multipage form to the editor.
  2. Define the elements that are used to match the document: anchors, static texts and separators.
  3. Define the different recognition areas in a graphical editor where, e.g. text blocks, tables, checkmarks, checkmark groups, barcodes and pictures are located.
  4. Set up the recognition properties, for each area, e.g OCR, ICR and attach data type definitions, dictionaries and verification rules.

The detailed instructions for creating fixed layouts you can find in ABBYY FlexiCapture 12 Help.

Development of Document Definitions for flexible documents

ABBYY FlexiLayout Studio user interface is designed to simplify FlexiLayout creation by directing the developer through a set of dialog boxes. In complicated cases requiring more detailed customization and assistance, FlexiCapture Studio provides direct access to its internal structural language for greater flexibility and more control. (ABBYY FlexiLayout Studio is supplied together with ABBYY FlexiCapture 12.)

  1. Load a selection of documents with different layouts.
  2. Define some generic elements that allow identifying documents and that can be used for orientation within one document, e.g. text strings, lines, spaces between elements.
  3. Define search elements for the data you are looking for e.g. text, numbers, date, tables, the length of the sting, the set of characters, one or multiple words, one or multiple lines.

Additionally, these elements are set into a relation with other areas set up in 2, for example right or below.

The detailed instructions for creating FlexiLayouts you can find in ABBYY FlexiCapture 12 Help.

See also

Key Features

15.08.2023 13:19:30

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.