Machine-Readable Zone Capture
The official travel or identity documents of many countries contain a machine-readable zone (MRZ) that ensures more accurate processing of the document data. The MRZ includes 2 or 3 lines with the OCR-B font text written in accordance with ICAO Document 9303 (see the specifications on the ICAO website).
This scenario is used for extracting data from a machine-readable zone on ID documents during customer onboarding or verification processes. The system recognizes MRZ on the document image and extracts the data from it. The extracted data contains several fields with the personal information about the document and its holder (document's type and expiry date, the first and the last names of the document holder, etc.). You may search through the fields, verify the data and save it to an external file for further processing.
To extract the data from MRZ, image files obtained by scanning or saved in the electronic format typically go through several processing stages, each of which has its own peculiarities:
- Preprocessing of scanned images or photos
You either scan or take a photo of an ID document's identity page with MRZ. Photos made with digital cameras of mobile devices may have low resolution and quality. Also, images may require some preprocessing prior to recognition.
- Extracting data from MRZ
No more than one MRZ may be captured from each image. The text of each of the 2 or 3 lines will be recognized and parsed to extract the data fields. Some of the fields and the MRZ as a whole have checksums, which will help you to verify the data.
- Export to an external file
You may also save the extracted data in an external format: XML and JSON are supported.
The procedure described below is implemented in the MRZExtraction code sample.
Implementing the scenario
Below is the detailed description of the recommended method of using ABBYY FineReader Engine 12 in this scenario. The proposed method uses processing settings that are most suitable for this scenario.
Step 1. Loading ABBYY FineReader Engine
Step 2. Loading settings for the scenario
Step 3. Loading and preprocessing the document images
Step 4. Extracting data from MRZ
Step 5. Working with the extracted data
Step 6. Exporting the extracted data
Step 7. Unloading ABBYY FineReader Engine
Required resources
You can use the FREngineDistribution.csv file to automatically create a list of files required for your application to function. For processing with this scenario, select in the column 5 (RequiredByModule) the following values:
Core
Core.Resources
Opening
Opening, Processing
Processing
Processing.OCR
Processing.OCR, Processing.ICR
Processing.OCR.NaturalLanguages
Processing.OCR.NaturalLanguages, Processing.ICR.NaturalLanguages
Export
Export, Processing
If you modify the standard scenario, change the required modules accordingly. You also need to specify the interface languages, recognition languages and any additional features which your application uses (such as, e.g., Opening.PDF if you need to open PDF files). See Working with the FREngineDistribution.csv File for further details.
Additional optimization
These are the sections of the Help file where you can find additional information about setting up the parameters for the various processing stages:
- Loading Engine
- Working with Profiles
Provides detailed description of predefined and user profiles. - Opening and preprocessing images
- Image Preprocessing
Describes a scenario of using ABBYY FineReader Engine to preprocess images. - Recognition
- Tuning Parameters of Preprocessing, Analysis, Recognition, and Synthesis
Customization of document processing using objects of preprocessing, analysis, recognition, and synthesis parameters. - Working with the extracted data
- Machine-Readable Zone Fields
The list of fields, which may be extracted from a machine-readable zone by means of ABBYY FineReader Engine 12, and their brief descriptions. - Working with Text
Working with the recognized text, paragraphs, words, and characters. - Using Voting API
Working with words and character recognition alternatives.
See also
7/3/2024 8:50:25 AM