Training while processing documents
ABBYY FlexiCapture for Invoices lets you improve recognition quality while processing documents. If the program fails to detect the correct location of a field on a document image, an Operator can specify the correct location and the program will use it when recognizing other documents.
Training is only available if ABBYY FlexiCapture can reliably identify the vendor by finding the corresponding record in a vendor database. If you have no vendor databases but still want to use field training, you can accumulate company information by adding records to your data sets while capturing invoices. For more information, see Looking up vendors and business units in the database.
This article explains how to train ABBYY FlexiCapture for Invoices using the locally-installed version of the Verification Station, and covers some training-related issues that Operators need to know about.
- Collect a batch of invoices (e.g. the invoices processed within the past month) and start feeding them to the program. See How to capture invoices.
- Once the documents are fed to the program, they are automatically recognized (this happens only if the Recognize added images automatically option is enabled on the Document Processing tab of the Options dialog; to open this dialog, click Tools → Options...) and the data are checked by means of validation rules.
- If the status of a recognized invoice is other than Valid or if you have other reasons to believe that the program failed to detect some of the fields, open the document in the document editor.
- Review the document form. The Vendor group of fields must be filled out correctly.
Training is done independently for each Document Variant. Invoices from the same vendor a considered to belong to the same Document Variant. If the vendor is detected incorrectly, the wrong Document Variant will be selected for an invoice during training. If the program fails to detect the vendor, select the right vendor using the Vendor Lookup feature. If you can't find the vendor in the database, type in the name manually as it appears on the image and save it to the database by clicking the Save.
Depending on your project's settings, you may also have to specify the unique ID of an invoice's vendor in order to enable the program to train on that invoice. To do this, type the unique ID in the VATID field (this field may have a different name in some localizations of projects). The VATID is a unique identification number assigned to companies for value-added taxation purposes.
If documents originating from the same vendor have widely varying layouts, you should use the clustering feature. For details, see Training with clustering.
- Training will only be successful if the regions of all the fields are marked up correctly, so make sure that the regions match the actual locations of their respective fields on the image. See Training line items for more information on how to mark up line items on an invoice.
To achieve this, in the image window of the document editor, adjust the regions or draw regions for those fields which the program failed to detect.
After that, the program will analyze the document. If the mark up of the field regions was modified and the training for this vendor is not prohibited, the document will be added to the batch.
How to change the region of a field
- Position the mouse pointer in a desired field on the data form, find the corresponding region on the image (it will be highlighted in blue), and click it (or draw a rectangle with the mouse);
- Position the mouse pointer on a desired region on the image (it will be highlighted in blue), click it (or draw the region with the mouse), and then select the corresponding field from the drop-down list that opens;
- Adjust the position of a region on the image by moving its boundaries with the mouse;
- Delete an incorrectly located region from the image: position the mouse pointer on its rectangle and when a red cross appears in the top right corner, click the red cross. The markup of the region will be deleted. Now create a new region for this field in the right location;
- On the data form, start typing a value into a field. A drop-down list will be displayed listing the words captured from the image that resemble the word that you are typing. Select the right word from the list, and the position of the word on the image will become the region of the field.
- All the fields of the invoice will be used for training purposes, not just those whose markup you have added or modified.
- Repeat steps 4-6 for the next document.
- When the third and subsequent invoices of the same vendor are added to the batch, the program starts the training process. The program will either train a special FlexiLayout (a FlexiLayout Variant) or suggest that a user gathers more examples (in this case move to the next document and go back to step 4).
If the FlexiLayout for the alternative has been successfully trained, it will be used with the next vendor invoice that determines this invoice variant. After the recognition, field regions will be imposed on an invoice image based on the training results.
If a new image is added to the batch, the program determines the quality of FlexiLayout application for the variant. If the added image deteriorates the quality of the application of field regions, it will not be used. Otherwise, it will be used for testing.
- Add a few more invoices from the vendor whose Document Variant has been trained and recognize them. Then open the newly added invoices one by one in the document editor to check if the regions are marked up correctly. If all the regions are located correctly, no additional training is required.
If you are not satisfied with the results, continue training the program on invoices from the given vendor (repeat steps 4-6). Now each time, the training process will be started. If the training is successful, a new FlexiLayout Variant will be created.