Field-Level Recognition

In the case of field-level recognition, short text fragments are recognized in order to capture data from certain fields. The recognition quality is crucial in this scenario.

This scenario may also be used as part of more complex scenarios where meaningful data are to be extracted from documents (for example, to capture data from paper documents into information systems and databases or to automatically classify and index documents in Document Management Systems).

In this scenario, the system recognizes either several lines of text in only some of the fields or the entire text on a small image. The system computes a certainty rating for each recognized character. The certainty ratings can then be used when checking the recognition results. Additionally, the system may store multiple recognition variants for words and characters in the text, which may then be used in voting algorithms to improve the quality of recognition.

The processing of small text fragments in this scenario is in some ways different from the same steps in other scenarios:

  1. Preprocessing of scanned images or photos

The images to be recognized may include markup and background noise, both of which may hamper recognition. For this reason, any unwanted markup and background noise are removed at this stage.

  1. Recognition of small text fragments

When recognizing small text fragments, the type of data to be recognized is known in advance. Therefore, the quality of recognition may be improved through the use of external dictionaries, regular expressions, custom recognition languages, and alphabets, and by imposing restrictions on the number of characters in a string. Text fields may contain both printed and handprinted text.

  1. Working with the recognized data

This scenario requires maximum recognition accuracy in order to keep data verification work to a minimum. The system may compute a certainty rating for each recognized word or character and provide multiple recognition variants from which several Engines may then choose the best candidate by applying voting algorithms.

Implementing the scenario

Below follows a detailed description of the recommended method of using ABBYY FineReader Engine 12 in this scenario. The suggested method uses processing settings deemed most appropriate for this scenario.

Step 1. Loading ABBYY FineReader Engine

Step 2. Loading settings for the scenario

Step 3. Loading and preprocessing the images

Step 4. Setting up the fields to be recognized

Step 5. Recognition

Step 6. Working with the recognized data

Step 7. Unloading ABBYY FineReader Engine

Required resources

You can use the FREngineDistribution.csv file to automatically create a list of files required for your application to function. For processing with this scenario, select in the column 5 (RequiredByModule) the following values:

Core

Core.Resources

Opening

Opening, Processing

Processing

Processing.OCR

Processing.OCR, Processing.ICR

Processing.OCR.NaturalLanguages

Processing.OCR.NaturalLanguages, Processing.ICR.NaturalLanguages

If you modify the standard scenario, change the required modules accordingly. You also need to specify the interface languages, recognition languages and any additional features which your application uses (such as, e.g., Opening.PDF if you need to open PDF files, or Processing.OCR.CJK if you need to recognize texts in CJK languages). See Working with the FREngineDistribution.csv File for further details.

Additional optimization

These are the sections of the help file where you can find additional information about setting up the parameters for the various processing stages:

See also

Basic Usage Scenarios Implementation

9/17/2024 3:14:40 PM

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.