Creating Document Definitions for semi-structured and unstructured documents

ABBYY FlexiCapture can be used to process unstructured documents containing information presented in a free style, for example contracts, letters, orders, annexes. Unstructured documents with text or images separated by blank sheets or pages with barcodes are processed and exported to PDF searchable files or graphic files.

The processing of such documents normally involves converting documents to electronic form and running a search according to key field values.

If possible, the search for key fields (such as a contract number) in such documents is performed using a FlexiLayout created with ABBYY FlexiLayout Studio. See Creating a Document Definition for semi-structured document processing.

NLP can be used to process unstructured documents. This technology uses NLP models to extract information from text.

If automatic search of key fields is impossible, the Operator may input their values manually. To do this, create a Document Definition with one field (or several fields, if necessary), and enable the option Don't recognize (key from image field - will be entered manually) in the recognition properties of this field. When running the verification process the Operator will then be able to input key field values manually.

You need to configure export to enable data storage: you can export key field values to file or database and save document images in a convenient format. You can save document images in graphic files or PDF searchable files.

Pay close attention when assembling pages into documents: with unstructured documents it can be fixed to determine to which document a particular page belongs. To automate the assembly of unstructured documents we recommend separating documents with blank sheets or pages with barcodes. When adding images to a batch (by scanning, adding from file, or creating an import profile) you then need to enable the option For images separated by and select the value blank pages or pages with barcode from the dropdown list, depending on which pages are to be used as separators. Pages are assembled into documents automatically: pages will be added to the current document until the next separator page. For details see Adding page images.

