Sample 2. Step 6: Creating a document identifier
When processing semi-structured documents in ABBYY FlexiCapture, one would normally wish to exclude documents not belonging to the current type. One way to identify a document is to mark at least one element as required.
In this particular case, the document heading will make a good identifier element, since it contains distinct text that can be easily read by the OCR engine.
Note.An identifier element or set of elements can be described in a predefined Header element (not used in this sample).
The document heading will be used solely to identify the document as belonging to the given type and will not be recognized in ABBYY FlexiCapture. In the FlexiLayout, describe the document heading as an element of type Static Text:
- Click the FlexiLayout tab in the program main window.
- Select SearchElements in the FlexiLayout tree.
- Select the Static Text command in FlexiLayout → Add Elements → Static Text or in the shortcut menu of the element.
- In the Name field, type a name for the element, e.g. FormHeader.
- Select Required element on the General tab, because the document heading is a required element.
- Click the Static Text tab.
- In the Search text field, type the text to find.
The batch contains test documents that have different headings: Easiest Recipes or Easy to Cook Recipes. Enter both headings.
The headings are written in one line on all the test images. Therefore, you can type the headings without spaces to speed up looking for single-line static text. Separate the two alternative headings by "|".
- Set the maximum number of errors that the detected text may contain (either in percentage points or as a number). In this particular case we recommend setting the Max error percentage to 20, allowing 5 errors among the 25 characters of the document heading.
Note.The maximum number of errors is selected by method of trial and error.
5/25/2023 7:55:03 AM