English (English)

Chinese Simplified (简体中文)

Field-Level Recognition

In the case of field-level recognition, short text fragments are recognized in order to capture data from certain fields. The recognition quality is crucial in this scenario.

This scenario may also be used as part of more complex scenarios where meaningful data are to be extracted from documents (for example, to capture data from paper documents into information systems and databases or to automatically classify and index documents in Document Management Systems).

In this scenario, the system recognizes either several lines of text in only some of the fields or the entire text on a small image. The system computes a certainty rating for each recognized character. The certainty ratings can then be used when checking the recognition results. Additionally, the system may store multiple recognition variants for words and characters in the text, which may then be used in voting algorithms to improve the quality of recognition.

The processing of small text fragments in this scenario is in some ways different from the same steps in other scenarios:

Preprocessing of scanned images or photos

The images to be recognized may include markup and background noise, both of which may hamper recognition. For this reason, any unwanted markup and background noise are removed at this stage.

Recognition of small text fragments

When recognizing small text fragments, the type of data to be recognized is known in advance. Therefore, the quality of recognition may be improved through the use of external dictionaries, regular expressions, custom recognition languages, and alphabets, and by imposing restrictions on the number of characters in a string. Text fields may contain both printed and handprinted text.

Working with the recognized data

This scenario requires maximum recognition accuracy in order to keep data verification work to a minimum. The system may compute a certainty rating for each recognized word or character and provide multiple recognition variants from which several Engines may then choose the best candidate by applying voting algorithms.

Implementing the scenario

Below follows a detailed description of the recommended method of using ABBYY FineReader Engine 12 in this scenario. The suggested method uses processing settings deemed most appropriate for this scenario.

Step 1. Loading ABBYY FineReader Engine

To start your work with ABBYY FineReader Engine, you need to create the Engine object. The Engine object is the top object in the hierarchy of the ABBYY FineReader Engine objects and provides various global settings, some processing methods, and methods for creating the other objects.

To create the Engine object, you can use the InitializeEngine function. See also other ways to load Engine object.

C#

public class EngineLoader : IDisposable
{
    public EngineLoader()
    {
        // Initialize these variables with the full path to FREngine.dll, your Customer Project ID,
        // and, if applicable, the path to your Online License token file and the Online License password
        string enginePath = "";
        string customerProjectId = "";
        string licensePath = "";
        string licensePassword = "";
        // Load the FREngine.dll library
        dllHandle = LoadLibraryEx(enginePath, IntPtr.Zero, LOAD_WITH_ALTERED_SEARCH_PATH);
           
        try
        {
            if (dllHandle == IntPtr.Zero)
            {
                throw new Exception("Can't load " + enginePath);
            }
            IntPtr initializeEnginePtr = GetProcAddress(dllHandle, "InitializeEngine");
            if (initializeEnginePtr == IntPtr.Zero)
            {
                throw new Exception("Can't find InitializeEngine function");
            }
            IntPtr deinitializeEnginePtr = GetProcAddress(dllHandle, "DeinitializeEngine");
            if (deinitializeEnginePtr == IntPtr.Zero)
            {
                throw new Exception("Can't find DeinitializeEngine function");
            }
            IntPtr dllCanUnloadNowPtr = GetProcAddress(dllHandle, "DllCanUnloadNow");
            if (dllCanUnloadNowPtr == IntPtr.Zero)
            {
                throw new Exception("Can't find DllCanUnloadNow function");
            }
            // Convert pointers to delegates
            initializeEngine = (InitializeEngine)Marshal.GetDelegateForFunctionPointer(
                initializeEnginePtr, typeof(InitializeEngine));
            deinitializeEngine = (DeinitializeEngine)Marshal.GetDelegateForFunctionPointer(
                deinitializeEnginePtr, typeof(DeinitializeEngine));
            dllCanUnloadNow = (DllCanUnloadNow)Marshal.GetDelegateForFunctionPointer(
                dllCanUnloadNowPtr, typeof(DllCanUnloadNow));
            // Call the InitializeEngine function 
            // passing the path to the Online License file and the Online License password
            int hresult = initializeEngine(customerProjectId, licensePath, licensePassword, 
                "", "", false, ref engine);
            Marshal.ThrowExceptionForHR(hresult);
        }
        catch (Exception)
        {
            // Free the FREngine.dll library
            engine = null;
            // Deleting all objects before FreeLibrary call
            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();
            FreeLibrary(dllHandle);
            dllHandle = IntPtr.Zero;
            initializeEngine = null;
            deinitializeEngine = null;
            dllCanUnloadNow = null;
            throw;
        }
    }
    // Kernel32.dll functions
    [DllImport("kernel32.dll")]
    private static extern IntPtr LoadLibraryEx(string dllToLoad, IntPtr reserved, uint flags);
    private const uint LOAD_WITH_ALTERED_SEARCH_PATH = 0x00000008;
    [DllImport("kernel32.dll")]
    private static extern IntPtr GetProcAddress(IntPtr hModule, string procedureName);
    [DllImport("kernel32.dll")]
    private static extern bool FreeLibrary(IntPtr hModule);
    // FREngine.dll functions
    [UnmanagedFunctionPointer(CallingConvention.StdCall, CharSet = CharSet.Unicode)]
    private delegate int InitializeEngine(string customerProjectId, string licensePath, 
        string licensePassword, string tempFolder, string dataFolder, bool isSharedCPUCoresMode, 
        ref FREngine.IEngine engine);
    [UnmanagedFunctionPointer(CallingConvention.StdCall)]
    private delegate int DeinitializeEngine();
    [UnmanagedFunctionPointer(CallingConvention.StdCall)]
    private delegate int DllCanUnloadNow();
    // private variables
    private FREngine.IEngine engine = null;
    // Handle to FREngine.dll
    private IntPtr dllHandle = IntPtr.Zero;
    private InitializeEngine initializeEngine = null;
    private DeinitializeEngine deinitializeEngine = null;
    private DllCanUnloadNow dllCanUnloadNow = null;
}

C++ (COM)

// Initialize these variables with the path to FREngine.dll, your FineReader Engine customer project ID,
// and, if applicable, the path to the Online License token and Online License password
wchar_t* FreDllPath;
wchar_t* CustomerProjectId;
wchar_t* LicensePath;  // if you don't use an Online License, assign empty strings to these variables
wchar_t* LicensePassword;
// HANDLE to FREngine.dll
static HMODULE libraryHandle = 0;
// Global FineReader Engine object
FREngine::IEnginePtr Engine;
void LoadFREngine()
{
    if( Engine != 0 ) {
    // Already loaded
    return;
    }
    // First step: load FREngine.dll
    if( libraryHandle == 0 ) {
        libraryHandle = LoadLibraryEx( FreDllPath, 0, LOAD_WITH_ALTERED_SEARCH_PATH );
        if( libraryHandle == 0 ) {
            throw L"Error while loading ABBYY FineReader Engine";
        }
    }
    // Second step: obtain the Engine object
    typedef HRESULT ( STDAPICALLTYPE* InitializeEngineFunc )( BSTR, BSTR, BSTR, BSTR, 
        BSTR, VARIANT_BOOL, FREngine::IEngine** );
    InitializeEngineFunc pInitializeEngine =
    ( InitializeEngineFunc )GetProcAddress( libraryHandle, "InitializeEngine" );
    if( pInitializeEngine == 0 || pInitializeEngine( CustomerProjectId, LicensePath, 
        LicensePassword, L"", L"", VARIANT_FALSE, &Engine ) != S_OK ) {
    UnloadFREngine();
    throw L"Error while loading ABBYY FineReader Engine";
    }
}

Step 2. Loading settings for the scenario

Step 3. Loading and preprocessing the images

Step 4. Setting up the fields to be recognized

Now you need to create blocks that contain your fields, and for each specify block type and known characteristics of the data inside.

Perform layout analysis of the document using the Analyze method, or manually add blocks that contain the fields you need to recognize. See Working with Layout and Blocks for instructions. Note that if you are recognizing handprinted text, you will have to add the blocks manually because they cannot be detected during automatic analysis.

For each field, you can now specify its own parameters of recognition. For example, if a field contains some text, use the ITextBlock::RecognizerParams property:

set the text type with the help of the TextTypes property of the RecognizerParams object. E.g., if the field contains digits written in ZIP-code style, use TT_Index text type.
set the language using the SetPredefinedTextLanguage method. Using special predefined languages can be helpful if you know the type of information contained in the field. E.g., if the field contains an address in the US, select the English_US_Address predefined language. This will ensure that the text is recognized more reliably.
set the SaveCharacterRecognitionVariants and SaveWordRecognitionVariants properties of the RecognizerParams object if you need to use the recognition variants for further verification of the result, as described below in step 6. Note that this setting is not available for handprinted texts.

For further details on recognizing different types of fields, consult the Recognizing Checkmarks, Recognizing Handprinted Texts, Recognizing Barcodes, and Recognizing Words with Spaces sections.

C#

// Analyze the document layout
frDocument.Analyze( null, null, null );
// Assume that we know the first block
// in the layout contains an address in the US
FREngine.ITextBlock addressBlock = frDocument.Pages[0].Layout.Blocks[0].GetAsTextBlock();
FREngine.IRecognizerParams paramsAddressBlock = addressBlock.RecognizerParams;
paramsAddressBlock.SetPredefinedTextLanguage( "English_US_Address" );
// Enable collecting recognition variants
paramsAddressBlock.SaveCharacterRecognitionVariants = true;
paramsAddressBlock.SaveWordRecognitionVariants = true;
// Set up the properties of other layout blocks in the same way
...

C++ (COM)

// Analyze the document layout
frDocument->Analyze( 0, 0, 0 );
// Assume that we know the first block
// in the layout contains an address in the US
FREngine::ILayoutBlocksPtr layoutBlocks = frDocument->Pages->Item( 0 )->Layout->Blocks;
FREngine::IRecognizerParamsPtr paramsAddressBlock = layoutBlocks->Item( 0 )->GetAsTextBlock()->RecognizerParams;
paramsAddressBlock->SetPredefinedTextLanguage( L"English_US_Address" );
// Enable collecting recognition variants
paramsAddressBlock->SaveCharacterRecognitionVariants = VARIANT_TRUE;
paramsAddressBlock->SaveWordRecognitionVariants = VARIANT_TRUE;
// Set up the properties of other layout blocks in the same way
...

Step 5. Recognition

Step 6. Working with the recognized data

Step 7. Unloading ABBYY FineReader Engine

After finishing your work with ABBYY FineReader Engine, you need to unload the Engine object. To do this, use the DeinitializeEngine exported function.

C#

public class EngineLoader : IDisposable
{
    // Unload FineReader Engine
    public void Dispose()
    {
        if (engine == null)
        {
            // Engine was not loaded
            return;
        }
        engine = null;
        // Deleting all objects before FreeLibrary call
        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
        int hresult = deinitializeEngine();

        hresult = dllCanUnloadNow();
        if (hresult == 0)
        {
            FreeLibrary(dllHandle);
        }
        dllHandle = IntPtr.Zero;
        initializeEngine = null;
        deinitializeEngine = null;
        dllCanUnloadNow = null;
        // throwing exception after cleaning up
        Marshal.ThrowExceptionForHR(hresult);
    }
    // Kernel32.dll functions
    [DllImport("kernel32.dll")]
    private static extern IntPtr LoadLibraryEx(string dllToLoad, IntPtr reserved, uint flags);
    private const uint LOAD_WITH_ALTERED_SEARCH_PATH = 0x00000008;
    [DllImport("kernel32.dll")]
    private static extern IntPtr GetProcAddress(IntPtr hModule, string procedureName);
    [DllImport("kernel32.dll")]
    private static extern bool FreeLibrary(IntPtr hModule);
    // FREngine.dll functions
    [UnmanagedFunctionPointer(CallingConvention.StdCall, CharSet = CharSet.Unicode)]
    private delegate int InitializeEngine( string customerProjectId, string LicensePath, string LicensePassword, , , , ref FREngine.IEngine engine);
    [UnmanagedFunctionPointer(CallingConvention.StdCall)]
    private delegate int DeinitializeEngine();
    [UnmanagedFunctionPointer(CallingConvention.StdCall)]
    private delegate int DllCanUnloadNow();
    // private variables
    private FREngine.IEngine engine = null;
    // Handle to FREngine.dll
    private IntPtr dllHandle = IntPtr.Zero;
    private InitializeEngine initializeEngine = null;
    private DeinitializeEngine deinitializeEngine = null;
    private DllCanUnloadNow dllCanUnloadNow = null;
}

C++ (COM)

void UnloadFREngine()
{
 if( libraryHandle == 0 ) {
  return;
 }
 // Release Engine object
 Engine = 0;
 // Deinitialize FineReader Engine
 typedef HRESULT ( STDAPICALLTYPE* DeinitializeEngineFunc )();
 DeinitializeEngineFunc pDeinitializeEngine =
  ( DeinitializeEngineFunc )GetProcAddress( libraryHandle, "DeinitializeEngine" );
 if( pDeinitializeEngine == 0 || pDeinitializeEngine() != S_OK ) {
  throw L"Error while unloading ABBYY FineReader Engine";
 }
 // Now it's safe to free the FREngine.dll library
 FreeLibrary( libraryHandle );
 libraryHandle = 0;
}

Required resources

You can use the FREngineDistribution.csv file to automatically create a list of files required for your application to function. For processing with this scenario, select in the column 5 (RequiredByModule) the following values:

Core

Core.Resources

Opening

Opening, Processing

Processing

Processing.OCR

Processing.OCR, Processing.ICR

Processing.OCR.NaturalLanguages

Processing.OCR.NaturalLanguages, Processing.ICR.NaturalLanguages

If you modify the standard scenario, change the required modules accordingly. You also need to specify the interface languages, recognition languages and any additional features which your application uses (such as, e.g., Opening.PDF if you need to open PDF files, or Processing.OCR.CJK if you need to recognize texts in CJK languages). See Working with the FREngineDistribution.csv File for further details.

Additional optimization

These are the sections of the help file where you can find additional information about setting up the parameters for the various processing stages:

Opening and preprocessing images

Image Preprocessing
Describes a scenario of using ABBYY FineReader Engine to preprocess images.

Recognition

Working with Languages
Using built-in and custom recognition languages.
Working with Dictionaries
Using dictionaries to improve recognition quality.
Recognizing Words with Spaces
Using dictionaries to recognize words with spaces (such as New York, etc.).
Recognizing Handprinted Texts
Using ICR (Intelligent Character Recognition).
Recognizing Checkmarks
Setting up recognition of checkmarks and groups of checkmarks.
Special Predefined Languages in ABBYY FineReader Engine
The list of recognition languages that contain special language units: addresses, date and time, human names, etc. These languages can be used for field recognition.

Working with the recognized data

Working with Text
Working with the recognized text, paragraphs, words, and characters.
Using Voting API
Working with words and character recognition alternatives.

Your use of this site is conditioned on Your continued compliance with the Terms of Use.

Terms of Use

Disclaimer of Warranty

Limitation of Liability

Transmission and Submission of Information

Downloads

Use of Content

Trademarks

Links to Third-Party Sites

Foreign Legislation

Subscription Terms

Partner Subscription Terms

Field-Level Recognition

Implementing the scenario

Step 1. Loading ABBYY FineReader Engine

C#

C++ (COM)

Step 2. Loading settings for the scenario

C#

C++ (COM)

Step 3. Loading and preprocessing the images

C#

C++ (COM)

Step 4. Setting up the fields to be recognized

C#

C++ (COM)

Step 5. Recognition

C#

C++ (COM)

Step 6. Working with the recognized data

Step 7. Unloading ABBYY FineReader Engine

C#

C++ (COM)

Required resources

Additional optimization

See also