Identifying and processing FlexiLayouts in ABBYY FlexiCapture
You can process an unlimited number of 'fixed' and 'flexible' Document Definitions in one FlexiCapture batch (a 'flexible' Document Definition is created from a FlexiLayout). If a batch contains several flexible Document Definitions, it is desirable to have the same pre-recognition parameters (i.e. language, text type, mode) in all the FlexiLayouts which are used to create the flexible Document Definitions. In this case, pre-recognition will be run only once and the pre-recognition results will be used for other Document Definitions. This will reduce the processing time, as pre-recognition usually takes up to 90% of the time required to match the Document Definition.
FlexiCapture runs pre-recognition on a page for each set of pre-recognition parameters specified in all of the flexible Document Definitions in the batch. If all the Document Definitions have the same pre-recognition language, pre-recognition is run only once, the detected objects are saved and used for the other Document Definitions in the batch. If one of the Document Definitions has a different pre-recognition language (or text type, or mode), FlexiCapture will need to pre-recognize the page twice, which will double the time required for Document Definition matching. For this reason, we recommend keeping the sets of pre-recognition parameters to a minimum.
In some cases, you can speed up FlexiLayout matching by skipping pre-recognition. This is possible if the FlexiLayout elements include only Separator, Barcode, White Gap, Region and Object Collection elements and all the White Gap and Object Collection elements meet one of the following requirements:
- no text is specified in the element's search constraints
- the UseRawText property of the element is true.
When fixed and semi-structured documents are processed within one batch, the program will attempt to match the fixed Document Definitions first. If the fixed Document Definitions are successfully matched with their documents, no FlexiLayouts are applied. If there are no matching fixed Document Definitions, the program will find suitable matches among the FlexiLayouts.
Classifiers
Classifiers are used to automate the selection of a FlexiLayout or a layout alternative. Classifier is a special project created in FlexiLayout Studio and imported to FlexiCapture. The project describes the tree-like structure of the classes to which a document may belong. Each class contains a set of elements that identifies a certain type of document. Upon classification, the names of the FlexiLayouts (or layout alternatives) to be used are saved in the properties of each classified page. For more information about classification projects, refer to Classifier project.
Selecting a FlexiLayout without the use of a classifier
If no classifiers are used, identifier elements are created in the FlexiLayout to make the selection of the appropriate FlexiLayout by FlexiCapture more reliable and to speed up the process itself. Practically any type of element can be used as an identifier. The only requirement is that it should be reliably detected on all of the documents of the given type. In practice, the most commonly used identifiers are Static Text, Barcode, and Character String elements. Sometimes more than one element are used to identify a document type.
The higher the identifier element in the FlexiLayout tree, the faster the Document Definition selection.
One way to create an identifier element is to clear the Optional element property - the object corresponding to the element must be present on all of the images. If the object described by the element is not detected, the Document Definition will not be matched with the image.
Additionally, you can identify a document by using the DontFind() function in the Advanced pre-search relations field. This will tell the program not to look for an optional element.
Another method of identifying a flexible Document Definition is to use the Quality property of the element hypothesis. Setting the Quality of any element to 0 in the Advanced post-search relations field (the corresponding code is Quality: 0;) will result in failure to match the Document Definition. Before setting the quality of an element to 0, you need to analyze the properties of the elements located above the current element in the FlexiLayout tree.
12.04.2024 18:16:02