Classification based on a database of companies
When to use company-based classification
Classification assigns each document to a particular class (see the Classification section for more information). Each document issuing company can be treated as a separate class.
Typically, documents originating from the same company will look similar and will have the same types of fields located in the same positions, which makes data extraction easier.
Documents can be classified using a database of companies. This database should be included in the respective ABBYY FlexiCapture project. To populate this database, you can use the list of companies stored in your ERP system. ABBYY FlexiCapture will periodically sync the database of companies with the latest data from your ERP system. If you don't have a database of companies yet, you can create it while capturing data from documents, adding companies into a database at the document verification stage.
The program will search for the necessary fields only on the first and the last page of each document, as company information is usually to be found on these pages.
Company-based classification has the following advantages over other classification methods:
- There is no need to collect sample document images in order to create a training set, which may require a lot of time and effort.
- Documents can be classified based on up to 100,000 classes, which is much more than in the case of image- and text-based classification.
Company-based classification can be used for field extraction. Each company will have its own section variant, for which you can train or create a separate FlexiLayout.
Note: Within a project, the following can be used simultaneously:
- a document type classifier at the batch type or project level
- a company-based classifier for documents of the same type at the Document Definition level
First, the program will run the document type classifier to determine the class of the documents at hand, and then it will run the company-based classifier for the documents of the required class.
The document variants that the classifier detects based on the database of companies uniquely identify the trained FlexiLayouts to be used for training. This means that field training will be carried out independently for each company.
Configuring company-based classification
Company-based classification is performed within a Document Definition, i.e. for documents of the same type. Documents of the same type have identical sets of data fields to be extracted (see the Document Definitions section for more information).
To classify documents using a database of companies:
- Right-click a document section and click Properties or open the Document Definition editor and click Document Definition → Document Definition Properties.
- On the Data Sets tab, select a data set from the list and click the Set Up... button.
- Select the Use database of companies option. By default, the required columns and their types are already specified in the data set. (A data set is essentially a table containing a list of fields where to look for companies; users cannot modify this table.)
- To connect the data set to an ODBC-compatible database, you need to map each field in the data set to its counterpart in the database. For detailed instructions, see Using vendor and business unit databases.
Note: The program will look for companies whose data set fields have been mapped to their matching database fields. You must map at least one field (e.g. the company name). If a data set field has no matching database field, specify None when mapping such a field.
Note: Only certain fields are used to look for company information on a document. These fields have a small lock icon next to them. You can add your own custom fields when configuring company based classification, but these fields will only be used to display information.
- To search for company names which have more than one variant, use normalization, a process which reduces all name variants to one standard name. In the Data Set Column Mapping dialog box, specify the necessary type of normalization in the Normalization field (see Normalization of values in data sets for more information).
Sometimes, a company's name can be known in advance — for example, from the data source parameters (i.e. the scanning operator's name or the sender's e-mail address).
ABBYY FlexiCapture has a feature that allows the supplier and the company subdivision to be specified explicitly prior to automatic detection.
To explicitly specify the subdivisions, set the value of the fc_Predefined:PredefinedSectionVariantId document registration parameter to the identifier (Id) of the appropriate entry in the Dataset. In this case, automatic company detection procedure will still be performed for the given entry. As a result, you will get the explicitly specified company name and a confidence value which indicates how well the explicitly specified name matches the name extracted from an image.
Note: This method can only be used if only one section in a document has multiple variants.
Checking and editing company-based classification results
No training is required when classifying documents using a database of companies, as the program will look up companies in a predefined list of company names. Classification errors can be corrected by operators. Whenever the program attributes a document to the wrong company, the operator can select the correct company name and save it to the database. The program will then use this correct information in future classifications.
To enable the operator to correct classification errors, you need to display classification results on the data form and add a button that will initiate field look-up. To do this, complete the following steps:
- Create a service field.
- In the Document Definition Editor, click Create Field → Service Field. Next, in the field properties, click the Data Source tab and select Flexible Section Variant ID from the Source list.
- Create any service fields as may be necessary to identify the company (e.g. IBAN and VATID).
- Create a database check rule:
- Right-click the group, click Properties..., click the Rules tab, and then click the New Rule... button.
- Select Database Check from the list and click OK.
- In the Data source field, select Data Sets. Then in the Data Sets field, select the necessary data set.
- In the Field where to save record ID field, select the service field that you created in step 1.
- Click the Add button and specify the necessary document and database fields. If the values of the document and database fields are different, select the search and replace options (Enter value from database → If values are different).
Now any fields detected by the classifier for company-based classification purposes will have a region.
- Add a button to the data form that will open the Look up dialog box:
- Right-click anywhere on the data form where you want to place the button and click Insert Button on the shortcut menu.
- On the Format tab, select the database check rule you created in step 2.
- On the Position tab, specify a name for the button.
Now a verification operator will be able to click this button on the data form to open the Look up dialog box.
Improving company-based classification
Specifying keywords and regular expressions
You can specify keywords and regular expressions to improve company detection. For keywords, use strings that uniquely identify a company, such as data from VATID or IBAN fields.
Editing company records
Another way to improve company detection is by editing the company records stored by ABBYY FlexiCapture. For each company, multiple name variants and addresses can be specified. This can be done by the administrator using the Document Definition editor or by a verification operator.
Please note that only company records stored by ABBYY FlexiCapture will be modified. Even if synchronization with an external database (e.g. an ERP system) is enabled, no changes made by the administrator or verification operators will be transferred to the external database.
Operators can add new records and edit existing records if allowed by the Document Definition.
By default, operators are not allowed to add or edit records. To allow adding and editing records by operators:
- In the Document Definition editor, click Document Definition → Document Definition Properties....
- In the dialog box that opens, click the Data Sets tab.
- Select a data set from the list and click the Set Up... button.
- Select the Operators can add records and Operators can edit records options.
To prevent operators from adding and editing records, clear the above two options.