Preparing vendor and business unit databases
ABBYY FlexiCapture for Invoices allows you to connect a data set to an ODBC-compatible database (See Using vendor and business unit databases), to load data once, or to update a data set with data from an external database at regular intervals (See Updating data sets).
Therefore, your first step should be deploying external data in an ODBC-compatible database, such as Microsoft Access, Microsoft SQL Server or Oracle.
Eliminating duplicate records in the external database
It is important to determine the field in the external vendor (or business unit) database whose value must be obtained as result of detecting the vendor (or business unit) on an invoice. This field (or set of fields) should become the unique key of the data set.
The unique key of a business unit data set is the Id field (See Columns in the BusinessUnits data set). From the user's point of view, this field is the result of ABBYY FlexiCapture for Invoices detecting the business unit to which the given invoice was issued.
The unique key of a vendor data set should be the value passed to the external information system as a result of ABBYY FlexiCapture for Invoices detecting the vendor that issued the given invoice. If one vendor table is used, the unique key must be associated with the Id field of the data set. This value will be passed to the external information system when the vendor is detected in the invoice.
If you are planning to process within one project invoices issued to different business units of your company, each of which has its own vendor database, you should associate the unique key of each business unit with the BusinessUnitId column of the Vendors data set, and the key of each vendor with the Id column of the Vendors data set. Therefore, the unique key of the vendor record to be passed to the external information system when the vendor is detected will be the pair of Id and BusinessUnitId values. (See Columns in the Vendors data set).
In the explanations that follow, we will consider only vendor detection, as the situation is completely identical in the case of business units.
The unique key of a record must determine the unique combination of parameters which are used for vendor detection. Very often, an external information system will consider a vendor record to be unique relying on more parameters than are specified in the invoice and/or are used for vendor detection.
For example, an external information system may expect to receive an MCN_USD identifier if the vendor My Company Name Ltd. issues an invoice in US dollars, or it may expect an MCN_EUR identifier if the same company issued an invoice in euros. Even though the currency is stated in the invoice (and the occasion when the MCN_USD should be returned can be distinguished from the occasion when the MCN_EUR identifier should be returned), currency is not used for vendor detection.
Therefore, the vendor detection mechanism cannot decide between MCN_USD and MCN_EUR.
In cases like this, create an MCN identifier, which will correspond to both MCN_USD and MCN_EUR and which will be returned when the vendor is detected. Next, you can create a rule in the Document Definition that will choose between MCN_USD and MCN_EUR depending on the currency of the given invoice.
To generalize, the unique identifier of a vendor record should have a unique set of parameters that are used for vendor detection, such as company name, company address, taxation IDs (VATID, NationalVATID), and international bank account number (IBAN). (See Columns in the Vendors data set) Only then will the vendor detection mechanism be able to select the right vendor record for an invoice.
Note: Next, the program can capture additional fields and use rules in the Document Definition to fine-tune the result to obtain the required value.
If the same (or easily confused) set of parameters that are used for vendor detection corresponds to multiple unique keys, the program will not be able to choose only one key, vendor detection will become haphazard, and the quality will degrade.
The reason for this is as follows. If the program can use the data printed on the invoice to reliably detect the My Company Name vendor but there are several records (MCN1, MCN2, ... MCNN) in the data set corresponding to the invoice data, the program may decide in favor of another record, which is a worse match for the invoice data but which does not result in so many records. This will ultimately cause the vendor to be detected incorrectly.
Consequently, it is very important to eliminate duplicate records in the external database and get a column in the external database which will have a unique value for each unique set of values of the record fields used for vendor detection.
When connected to the data set, rows with identical values in the column that will be associated with the unique key of the data set, will be automatically collapsed into one record in the data set.
A data set can store multiple values for one logical column of one company record.
Storing multiple values is necessary because such company parameters as company name, street, etc. may vary from invoice to invoice (e.g. we may have "My Company Name" and "MCN Ltd."), but in order to detect the company reliably, the text in the data set must closely match the text captured from an invoice image. Besides, company may have multiple bank accounts or other attributes.
Values for multiple-value columns may be taken from an external database (e.g. a table of vendors may contain one to five bank details for each company), or they may be added by the user during data capture (e.g. an Operator may add to the data set a commonly used variant of a company name that is not known to the external database).
data sets use column denormalization, i.e. they store data in the following form:
|Unique key of data set*||Name1||Name2||...||NameN||...|
|1||My Company Name||MCN Ltd.||...||<empty value>||...|
|2||The Second Company, Inc.||S-Company||...||<empty value>||...|
For one logical field of a record, say, Name, multiple columns are created in the data set, which store all the possible values of the logical field. Thus, Name may be termed a "complex column."
When connecting a table (or a view) from an external database, you can use column denormalization as described above, or more common row denormalization. In the latter case, the rows that should be merged in the data set into the parameters of one record should have the same value of the column that corresponds to the unique key in the data set. For example:
|Column corresponding to unique key in data set*||Name||...|
|1||My Company Name||...|
|2||The Second Company, Inc.||...|
*The unique key of a BusinessUnits data set is the Id field. The unique key of a Vendors data set is either the Id field or the pair of fields Id and BusinessUnitId, depending on the settings. (See Columns in the Vendors data set)