Detecting the main fields
This article describes how the main fields of an invoice are detected and captured.
The program starts processing an invoice by recognizing its text in accordance with the Document Definition settings:
- Recognition mode (Fast / Balanced / Normal / Accurate) determines the speed of recognition and the quality of the text layer obtained as a result. To specify a recognition mode, in the Document Definition Editor, click Document Definition → Document Definition Properties... → Recognition).
- Recognition languages are the languages to be used for recognition. To specify recognition languages, in the Document Definition Editor, click Document Definition → Document Definition Properties... → Document Definition Settings, and then click Edit in the Countries and Languages group to select the required languages.
Note: Recognition languages in FlexiCapture for Invoices are tied to the country settings. When adding an invoice country to the Countries and Languages group, the corresponding languages will automatically appear in the Document Definition settings. Invoice fields are extracted upon recognition.
To detect and capture fields on an invoice, the program can use:
Both methods are described below, together with the algorithm that either combines the results obtained by using both these methods or selects the best result.
Using a FlexiLayout
Business unit and vendor
Using pre-determined vendor and business unit values in conjunction with extracted values
The vendor or the business unit of the invoice's company can be determined in advanced based on the invoice's source (name of the Scanning Operator or the e-mail address of the message's sender).
You can specify the vendor and/or the business unit explicitly prior to automatic detection.
To do so, set the value of the document's registration parameter fc_Predefined:InvoicePredefinedVendorId (fc_Predefined:InvoicePredefinedBusinessUnitId) to the identifier (Id) of an entry in the Vendors or BusinessUnits data set.
Doing this does not prevent automatic detection of the vendor and/or the business unit from taking place. Thanks to this, in addition to the pre-determined vendor and/or business unit, you will get a confidence value (this value indicates how well the pre-determined values match values extracted from the image), as well as the regions of fields from the Vendor and/or Business Unit field groups.
Invoice Header field group
Amounts field group
Purchase Order field group
The Line Items field group
Using neural networks
One of the main advantages offered by neural networks is their ability to self-learn: neural networks can detect complex dependencies existing among input data and make some useful generalizations.
The program includes two neural networks that can be used to capture the following fields:
- InvoiceNumber
- InvoiceDate
- Total
- Vendor \ Name
- Vendor \ Address
- Business Unit \ Name
- Business Unit \ Address
- Purchase Orders \ Order Number
- LineItems:
- OrderNumber
- OrderDate
- Position
- ArticleNumber
- Description
- Quantity
- Unit of measurement
- Unit Price
- Total Price Netto
- VATPercentage
For maximum precision, the program will use both a FlexiLayout and its neural networks to capture invoice fields. Those fields that the program fails to extract using its neural networks will be extracted using the FlexiLayout. If a field can be extracted both by the neural networks and the FlexiLayout, the program will intelligently combine the results obtained through both methods. How the results are combined depends on the field (see Combining the field detection results for details).
Disabling the neural networks
By default, the neural networks will be used as the second method of capturing document fields. If you need to process documents other than invoices within your invoice project, you may want to disable the neural network, as it was specifically trained to capture invoice fields and may not perform well on other types of documents.
To disable the neural network for the Line Items group:
- Open the Document Definition Editor.
- Click Document Definition Properties... → Document Definition Settings → Additional Fields and Features.
- Disable the Thorough extraction of invoice line items option.
To disable the neural network for the Invoice Header, Vendor, Business Unit, and Purchase Order groups:
- Open the Document Definition Editor.
- Click Document Definition Properties... → Document Definition Settings → Additional Fields and Features.
- Disable the Thorough extraction of invoice header fields option.
Combining the field detection results
How the program combines the field detection results or selects the best result depends on the field. As a general rule, precedence will be given to the results obtained by the respective neural network. Exceptions to this rule are searches based on data sets and searches using regular expressions created for specific customer documents.
Invoice Header field group
The results obtained by the neural network will always have precedence for the following fields:
- Invoice Number
- Invoice Date
- Total
Business unit and vendor
By default, the business unit and vendor are detected based on a data set, provided a data set is selected.
Additionally, the following fields may be detected using the neural network if there is no corresponding record in the data set:
- Name
- Address
If no data set is selected, only the neural network will be used.
Purchase Order field group
The neural network will only be used if the value is not detected by means of a data set or a regular expression.
Line items
For line item fields, precedence will be given to the results obtained by the neural network. If the neural network detects the entire table of line items, this table will be used for further processing. Otherwise, the program will use the line items detected by means of the FlexiLayout.
If the neural network detects only the Description and TotalPriceNetto fields for each line item, they will be complemented with the fields detected by means of the FlexiLayout.
6/18/2023 5:47:23 PM