Training while processing documents
ABBYY FlexiCapture for Invoices lets you improve recognition quality while processing documents. If the program fails to detect the correct location of a field on a document image, you can specify its correct location and the program will use it when recognizing other documents.
Training is only available if ABBYY FlexiCapture can reliably identify the company by finding the corresponding record in a database. If you have no databases but still want to use field training, you can accumulate company information by adding records to your data sets while capturing documents. For more information, see Looking up vendors and business units in the database.
This article explains how to train ABBYY FlexiCapture for Invoices using the locally-installed version of the Verification Station, and covers some training-related issues that operators need to know.
To train the program while processing documents, complete the following steps:
- Collect a batch of documents (e.g. invoices processed within the past month) and start feeding them to the program. See How to capture invoices.
- Once the documents are fed to the program, they will be automatically recognized (this will happen only if the Recognize added images automatically option is enabled on the Document Processing tab of the Options dialog box; to open this dialog, click Tools → Options...) and the data will be checked by means of validation rules.
- If the status of a recognized document is other than Valid or if you have other reasons to believe that the program failed to detect some of the fields, open the document in the document editor.
- Review the document form. The Vendor/Issuer group of fields must be filled out correctly.
Training is done independently for each document variant. Documents from the same company will be deemed to belong to the same document variant. If the program fails to identify the issuing company, either select it from your company database or key it in manually from the document image and save it to your company database by clicking Save.
Depending on your project's settings, you may also have to specify the unique ID of the issuing company to use its document for training. To do this, type the company's unique ID in the VATID field (this field may have a different name in some projects depending on the country). The VATID is a unique identification number assigned to companies for tax purposes.
If documents originating from the same company have widely varying layouts, you should use the clustering feature. For details, see Training with clustering.
- Training will only be successful if the regions of all the fields are identified correctly, so you need to make sure that the regions match the actual locations of their respective fields on the image. For more information on how to mark up line items on a document, see Training line items.
To do this, in the image window of the document editor, adjust the regions or draw regions for those fields which the program failed to detect.
After that, the program will analyze the document. If the region markup has been modified and training for documents from this company is not prohibited, the document will be added to the batch.
How to change the region of a field
- Position the mouse pointer in a desired field on the data form, find the corresponding region on the image (it will be highlighted in blue), and click it (or draw a rectangle with the mouse).
- Position the mouse pointer on a desired region on the image (it will be highlighted in blue), click it (or draw the region with the mouse), and then select the corresponding field from the drop-down list that opens.
- Adjust the position of a region on the image by moving its boundaries with the mouse.
- Delete an incorrectly located region from the image: position the mouse pointer on its rectangle and when a red cross appears in the top right corner, click the red cross. The markup of the region will be deleted. Now create the correct region for this field.
- On the data form, start typing a value into a field. A drop-down list will be displayed, listing the words captured from the image that look like the word that you are typing. Select the right word from the list, and the position of the word on the image will become the region of the field.
Note: The program will be trained on all the document's fields, not just on those whose regions you have drawn or adjusted.
- Open the next document and repeat steps 4 and 5.
- To initiate the training process, a training batch must contain at least one document. If clustering is used, a separate FlexiLayout will be created for each cluster; otherwise, a FlexiLayout will be created for each company (see Training with clustering for more information).
- The program will test trained FlexiLayout variant by applying it to all the documents in the training batch and comparing the results with the adjusted markup obtained in step 5. If the program determines that the trained FlexiLayout delivers better results than its earlier version, the trained FlexiLayout will be used next time you recognize documents belonging to this document variant.
If the program determines that the trained FlexiLayout variant delivers worse results than its earlier version, you will need to continue training it on documents from the given company (steps 4 and 5). The training process completes when the trained FlexiLayout variant can correctly identify all the field regions.