Field Extraction Training

In semi-structured documents, a field may sometimes be extracted incorrectly. You can improve extraction quality by training the Document Definition. To do this, you will have to provide some documents typical for your workflow, marked up with correct regions for each field.

Training to extract a field

ABBYY FlexiCapture SDK allows you to train field extraction for a variant of a Document Definition section. Each section in a non-invoice project has a "Variants" data set from which you can select a variant; for invoices, specify a vendor record from the "Vendors" data set instead.

  1. Add a new training batch using the AddNew method of the FieldsExtractionTrainingBatches object, passing the SectionDefinition object for the section and the DataSetTableRecord object for the section variant.
  2. Add the images for training to the batch. Call the Recognize method of the Batch object to match the Document Definitions and recognize all the documents in the batch.
  3. Correct the layout of the fields that will be trained. Assign the correct region to each field by calling the SetFieldRegion method of the Page object.
    Notes:
      • You can get correct regions from an external source, for example, by having a human operator verify and mark up the recognized documents.
      • Only the fields which can have region may be trained. See the description of the CanHaveRegion property of the FieldDefinition object.
  1. Train field extraction. Get the FieldsExtractionTrainer object from the training batch and call its Train method. The Document Definition will be updated automatically.

C# code

Child fields training

You may train the child fields along with the parent field. The child fields should also be able to have regions. To get the child fields, use the Children property of the Field object. Correct the region for each of the child fields by calling the SetFieldRegion method of the Page object and then train fields extraction as described above.

Repeatable fields and line items training

You can also train a field with several instances. Correct the region for each of the field instances and then train fields extraction as described above.

Line items of an invoice or a purchase order are essentially repeatable groups of fields, so everything in this section applies to them as well.

  1. Get the collection of instances of the field using the Instances property of the Field object. If some of the instances were not found on the page, call the AddNew method of the FieldInstances collection object and add the missing instances.
    Note: All instances have the same field hierarchy, that is, have the same set of child fields, etc.
  2. Set the correct region for each instance using the SetFieldRegion method.
    For line items, consider using the ContinueLineItems method of the Document object after correcting several line items to detect the rest automatically.
  3. Train fields extraction: call the Train method of the FieldsExtractionTrainer object.
    Note: If you have a repeatable group of fields (field type FT_Group), you need to correct the regions of all the child fields for each of the group field instances.

C# code

Samples

See the Fields Extraction Training code sample for an implementation of this scenario.

See also

Field Extraction Training Objects

Retrieving the Matched Section Variant

15.08.2023 13:19:30

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.