Training ABBYY FlexiCapture for Invoices
The quality of data capture can be improved by training ABBYY FlexiCapture for Invoices. Users can train the program either prior to or while processing documents.
Training is available only for the following kinds of images and in the following cases:
- Images on which fields cannot be reliably detected by the pre-configured Document Definition.
- The user needs to extracted data from fields not defined in the default Document Definitions.
When training may be required
- Field location varies among documents of the same time
In a stream of documents of the same type unusual documents may pop up where field locations differ from those on the standard document. If ABBYY FlexiCapture for Invoices cannot reliably detect fields on such documents, the user can indicate the correct field locations and the program will "learn" how to detect these fields.
Note: The exact document layout depends on the issuing company. For this reason, documents from different companies are treated as different variants and ABBYY FlexiCapture for Invoices will self-train separately on each document variant.
- The user needs to extract fields not defined by default
Apart from the main and additional fields supported by default, the user may need to capture data from fields yet unknown to the program. This can be achieved by creating custom fields in the Document Definition and specifying their locations on document images.
How training works
During normal document processing or in a special training mode, the user adds document images to be used for training the program. The added documents are automatically recognized and submitted for verification. The user verifies the documents and corrects field locations wherever required, thus creating a reference layout.
Next, he trained document is placed into the training batch created for the given document variant. The administrator can view the list of all document variants and their training batches by clicking Open Field Extraction Training Batches. Once the first training document is added for a document variant, the program begins to accumulate documents in the respective training batch.
To initiate the training process, a training batch must contain at least one document. If clustering is used, a separate FlexiLayout will be created for each cluster; otherwise, a FlexiLayout will be created for each company (see Training with clustering for more information).
Training creates a FlexiLayout variant, which will be used for all documents that belong to the given document variant (e.g. for invoices from a particular vendor or for purchase orders from a particular customer).
Once the training completes, the program automatically tests the FlexiLayout variant onall the sample documents.
The quality of the new FlexiLayout variant is determined by comparing the recognition results with the reference layout specified by the user (in the same manner, the program will determines the quality of the main FlexiLayout, is used when no training has been done). Next, the quality of the FlexiLayout variant is compared against the quality of the previous FlexiLayout variant or with the quality of the main FlexiLayout:
- If the quality of the new FlexiLayout variant is worse than the quality of the old FlexiLayout variant or the main FlexiLayout, the new FlexiLayout variant is not saved and the user sees a corresponding message in the Train Document Definition window.
- If the quality of the new FlexiLayout variant is better than the quality of the old FlexiLayout variant or the main FlexiLayout, it is saved and used to process this document variant.
ABBYY FlexiCapture for Invoices can be trained either by the administrator or by operators with sufficient permissions. To train the program, the user needs to add document images to the working batch, which will then be automatically recognized and submitted for verification. Once the user verifies the results and corrects field locations where required, the documents will be added to the training batch used for the given document variant. When the program has a sufficient number of documents for the document variant, training will begin automatically. The program will use the knowledge obtained through training to recognize all future documents that belong to this document variant.
If an operator fails to achieve a sufficient degree of accuracy through training, the administrator can Open Field Extraction Training Batches and continue training the program himself. The administrator additionally can:
- Add or remove document images used for training.
- Create new training batches.
- Add document images that will not be used for training, but will be used when testing the trained FlexiLayout variant.
- Export the trained FlexiLayout variant to ABBYY FlexiLayout Studio or import another FlexiLayout from ABBYY FlexiLayout Studio.
Once the desired degree of accuracy is achieved, the administrator can prohibit operator training for the given document variant.