Detecting the main fields

This article describes how the main fields of an invoice are detected and captured.

The program starts processing an invoice by pre-recognizing its text in accordance with the Document Definition settings:

  • Pre-recognition mode (Very fast / Fast / Balanced / Thorough) determines the speed of pre-recognition and the quality of the text layer obtained as a result. To specify a pre-recognition mode, in the Document Definition Editor, click Document Definition → Document Definition Properties...Recognition).
  • Pre-recognition languages are the languages to be used for pre-recognition. To specify pre-recognition languages, in the Document Definition Editor, click Document Definition → Document Definition Properties...Document Definition Settings, and then click Edit in the Countries and Languages group to select the required languages.

Once an invoice is pre-recognized, the program starts capturing its fields.

To detect and capture fields on an invoice, the program can use:

Both methods are described below, together with the algorithm that either combines the results obtained by using both these methods or selects the best result.

 

Using a FlexiLayout

Business unit and vendor

Using pre-determined vendor and business unit values in conjunction with extracted values

The vendor or the business unit of the invoice's company can be determined in advanced based on the invoice's source (name of the Scanning Operator or the e-mail address of the message's sender).

You can specify the vendor and/or the business unit explicitly prior to automatic detection.

To do so, set the value of the document's registration parameter fc_Predefined:InvoicePredefinedVendorId (fc_Predefined:InvoicePredefinedBusinessUnitId) to the identifier (Id) of an entry in the Vendors or BusinessUnits data set.

Doing this does not prevent automatic detection of the vendor and/or the business unit from taking place. Thanks to this, in addition to the pre-determined vendor and/or business unit, you will get a confidence value (this value indicates how well the pre-determined values match values extracted from the image), as well as the regions of fields from the Vendor and/or Business Unit field groups.

Invoice Header field group

Amounts field group

Purchase Order field group

The Line Items field group

 

Using neural networks

One of the main advantages offered by neural networks is their ability to self-learn: neural networks can detect complex dependencies existing among input data and make some useful generalizations.

The program includes two neural networks that can be used to capture the following fields:

  • InvoiceNumber
  • InvoiceDate
  • Total
  • Vendor \ Name
  • Vendor \ Address
  • Business Unit \ Name
  • Business Unit \ Address
  • Purchase Orders \ Order Number
  • LineItems:
    • OrderNumber
    • OrderDate
    • Position
    • ArticleNumber
    • Description
    • Quantity
    • Unit of measurement
    • Unit Price
    • Total Price Netto
    • VATPercentage

For maximum precision, the program will use both a FlexiLayout and its neural networks to capture invoice fields. Those fields that the program fails to extract using its neural networks will be extracted using the FlexiLayout. If a field can be extracted both by the neural networks and the FlexiLayout, the program will intelligently combine the results obtained through both methods. How the results are combined depends on the field (see Combining the field detection results for details).

 

Disabling the neural networks

By default, the neural networks will be used as the second method of capturing document fields. If you need to process documents other than invoices within your invoice project, you may want to disable the neural network, as it was specifically trained to capture invoice fields and may not perform well on other types of documents.

To disable the neural network for the Line Items group:

  • Open the Document Definition Editor.
  • Click Document Definition Properties... Document Definition Settings Additional Fields and Features.
  • Disable the Thorough extraction of invoice line items option.

To disable the neural network for the Invoice Header, Vendor, Business Unit, and Purchase Order groups:

  • Open the Document Definition Editor.
  • Click Document Definition Properties... Document Definition Settings Additional Fields and Features.
  • Disable the Thorough extraction of invoice header fields option.

     

Combining the field detection results

How the program combines the field detection results or selects the best result depends on the field. As a general rule, precedence will be given to the results obtained by the respective neural network. Exceptions to this rule are searches based on data sets and searches using regular expressions created for specific customer documents.

   

Invoice Header field group

The results obtained by the neural network will always have precedence for the following fields:

  • Invoice Number
  • Invoice Date
  • Total

   

Business unit and vendor

By default, the business unit and vendor are detected based on a data set, provided a data set is selected.

Additionally, the following fields may be detected using the neural network if there is no corresponding record in the data set:

  • Name
  • VATID (ABN)
  • Address

If no data set is selected, only the neural network will be used.

   

Purchase Order field group

The neural network will only be used if the value is not detected by means of a data set or a regular expression.

   

Line items

For line item fields, precedence will be given to the results obtained by the neural network. If the neural network detects the entire table of line items, this table will be used for further processing. Otherwise, the program will use the line items detected by means of the FlexiLayout.

If the neural network detects only the Description and TotalPriceNetto fields for each line item, they will be complemented with the fields detected by means of the FlexiLayout.

11/10/2020 12:08:04 PM


Please leave your feedback about this article