Training with clustering
During training, ABBYY FlexiCapture places incoming documents into their appropriate training batches associated with their originating companies. Typically, documents from the same company will have similar layouts, which means that you can train a FlexiLayout and use it at the verification stage. If documents originating from the same company have widely varying layouts, you should use the clustering feature. When the clustering feature is turned on, ABBYY FlexiCapture for Invoices will automatically analyze documents and put them into groups (termed “clusters”) based on features that they have in common. A separate FlexiLayout will be created for each cluster.
The clustering feature is turned on by default. To disable clustering, complete the following steps:
- In the Document Definition editor, click Document Definition → Document Definition Properties....
- In the dialog box that opens, click the Document Definition Settings tab.
- Click the Edit... button to the right of the Additional Fields and Features group.
- In the Document Definition Features dialog box, clear the Enable clustering option.
The training documents will be placed into the batch associated with their respective company. If the clustering feature is turned on and you receive documents with widely varying layouts from the same company, documents from this company will be clustered inside the training batch used for this company. A separate FlexiLayout will be trained for each cluster. Training will be initiated once a cluster contains at least one document. Please note that clustering is a fully automatic process and the actual clusters remain invisible to the user.
If you have no databases but still want to use field training, you can accumulate company information by adding records to your data sets while capturing documents. For more information, see Looking up vendors and business units in the database.
A FlexiLayout will be created as a result. Please note the following:
- If the clustering feature is turned off, documents will be placed into their appropriate training batches used for their respective companies and a FlexiLayout will be created for each company.
- If the clustering feature is turned on, documents will be clustered inside the training batch and a FlexiLayout will be created for each cluster.
Note: When updating a project created in an earlier version of ABBYY FlexiCapture, you can use your existing FlexiLayouts without any modifications. However, if you choose to use the clustering feature, the clustering algorithm will redistribute your documents among the training batches and a new FlexiLayout will be created for each cluster.
The Samples count column shows the number of documents in the batch. The Samples matched column shows the number of documents where the trained FlexiLayout detects 100% of the fields.
If documents from a specific company are recognized with too many errors, you can create a custom FlexiLayout or export a trained FlexiLayout and edit it in ABBYY FlexiLayout Studio. Once you are done, import your custom or edited FlexiLayout back into the training batch.
To export a trained FlexiLayout from ABBYY FlexiCapture, do the following:
- On the Project Setup Station, switch to the training batch view by clicking Field Training → Open Field Extraction Training Batches (or by pressing Ctrl + Alt + B).
- Right-click the batch and then click Export Trained FlexiLayout... on the shortcut menu.
- In the dialog box that opens, select where you want to save the *.fsp project file containing your FlexiLayout. (You will then be able to open this file in ABBYY FlexiLayout Studio and modify the FlexiLayout.)
You can import a modified or a completely new FlexiLayout into a training batch to be used for one specific company (for details, see Training by users with project setup permissions).
To import a FlexiLayout into a training batch, do the following:
- On the Project Setup Station, switch to the Field Extraction Training Batches view by clicking Fields Training → Open Field Extraction Training Batches (or by pressing Ctrl + Alt + B).
- Right-click the batch and then click Import FlexiLayout... on the shortcut menu.
- In the dialog box that opens, select the *.afl file containing your FlexiLayout.
If you are using the clustering feature, please note the following limitations:
- If you are creating a new FlexiLayout manually, make sure that it covers all the possible document variants originating from the given company — you cannot manually create a FlexiLayout for one cluster only.
- Only a FlexiLayout for the main document fields will be exported. No FlexiLayout can be generated and exported for line item fields, as this type of field uses a separate machine learning algorithm, whose results cannot be exported or modified. However, you can still create a FlexiLayout for line item fields manually.
- Only the FlexiLayout trained for the first cluster will be exported.
- After you import a new or modified FlexiLayout into your training batch:
- There will be no training while processing documents.
- Clustering will be disabled for this batch.
- The imported FlexiLayout will be used for processing all documents from this company, regardless of their cluster.