Field-Level Recognition
In the case of field-level recognition, short text fragments are recognized in order to capture data from certain fields. The recognition quality is crucial in this scenario.
This scenario may also be used as part of more complex scenarios where meaningful data are to be extracted from documents (for example, to capture data from paper documents into information systems and databases or to automatically classify and index documents in Document Management Systems).
In this scenario, the system recognizes either several lines of text in only some of the fields or the entire text on a small image. The system computes a certainty rating for each recognized character. The certainty ratings can then be used when checking the recognition results. Additionally, the system may store multiple recognition variants for words and characters in the text, which may then be used in voting algorithms to improve the quality of recognition.
The processing of small text fragments in this scenario is in some ways different from the same steps in other scenarios:
- Preprocessing of scanned images or photos
The images to be recognized may include markup and background noise, both of which may hamper recognition. For this reason, any unwanted markup and background noise are removed at this stage.
- Recognition of small text fragments
When recognizing small text fragments, the type of data to be recognized is known in advance. Therefore, the quality of recognition may be improved through the use of external dictionaries, regular expressions, custom recognition languages, and alphabets, and by imposing restrictions on the number of characters in a string. Text fields may contain both printed and handprinted text.
- Working with the recognized data
This scenario requires maximum recognition accuracy in order to keep data verification work to a minimum. The system may compute a certainty rating for each recognized word or character and provide multiple recognition variants from which several Engines may then choose the best candidate by applying voting algorithms.
Implementing the scenario
Below follows a detailed description of the recommended method of using ABBYY FineReader Engine 12 in this scenario. The suggested method uses processing settings deemed most appropriate for this scenario.
Step 1. Loading ABBYY FineReader Engine
Step 2. Loading settings for the scenario
Step 3. Loading and preprocessing the images
Step 4. Setting up the fields to be recognized
Step 5. Recognition
Step 6. Working with the recognized data
Step 7. Unloading ABBYY FineReader Engine
Required resources
You can use the FREngineDistribution.csv file to automatically create a list of files required for your application to function. For processing with this scenario, select in the column 5 (RequiredByModule) the following values:
Core
Core.Resources
Opening
Opening, Processing
Processing
Processing.OCR
Processing.OCR, Processing.ICR
Processing.OCR.NaturalLanguages
Processing.OCR.NaturalLanguages, Processing.ICR.NaturalLanguages
If you modify the standard scenario, change the required modules accordingly. You also need to specify the interface languages, recognition languages and any additional features which your application uses (such as, e.g., Opening.PDF if you need to open PDF files, or Processing.OCR.CJK if you need to recognize texts in CJK languages). See Working with the FREngineDistribution.csv File for further details.
Additional optimization
These are the sections of the help file where you can find additional information about setting up the parameters for the various processing stages:
- Opening and preprocessing images
- Image Preprocessing
Describes a scenario of using ABBYY FineReader Engine to preprocess images. - Recognition
- Working with Languages
Using built-in and custom recognition languages. - Working with Dictionaries
Using dictionaries to improve recognition quality. - Recognizing Words with Spaces
Using dictionaries to recognize words with spaces (such as New York, etc.). - Recognizing Handprinted Texts
Using ICR (Intelligent Character Recognition). - Recognizing Checkmarks
Setting up recognition of checkmarks and groups of checkmarks. - Special Predefined Languages in ABBYY FineReader Engine
The list of recognition languages that contain special language units: addresses, date and time, human names, etc. These languages can be used for field recognition. - Working with the recognized data
- Working with Text
Working with the recognized text, paragraphs, words, and characters. - Using Voting API
Working with words and character recognition alternatives.
See also
9/17/2024 3:14:40 PM