Text Extraction

This scenario enables the extraction of the body text of a document and texts on logos, seals, and on any elements other than the body text.

The natural order of the text "how a human would read it" is preserved. You can then feed the documents to natural language processing (NLP) engines on your side, for example, to be quickly summarized, searched for sensitive information, or go through a sentiment review.

To extract the main text of the document, image files obtained by scanning or saved in the electronic format typically go through several processing stages, each of which has its own peculiarities:

  1. Preprocessing the scanned images or photos

Scanned images may require some preprocessing prior to recognition, for example, if scanned documents contain background noise, skewed text, inverted colors, black margins, wrong orientation, or resolution.

  1. Recognition of maximum amount of text on a document image

Recognition of images is performed using settings that ensure that all possible text is found and extracted from a document image.

Scenario implementation

Below is the detailed description of the recommended method of using ABBYY FineReader Engine 12 in this scenario. The proposed method uses processing settings that are most suitable for this scenario.      

Step 1. Loading ABBYY FineReader Engine

Step 2. Loading settings for the scenario

Step 3. Loading and preprocessing the images

Step 4. Document recognition

Step 5. Searching for important information

(Optional) Step 6. Document export

Step 7. Unloading ABBYY FineReader Engine

Required resources

You can use the FREngineDistribution.csv file to automatically create a list of files required for your application to function. For processing with this scenario, select in the column 5 (RequiredByModule) the following values:

Core

Core.Resources

Opening

Opening, Processing

Processing

Processing.OCR

Processing.OCR, Processing.ICR

Processing.OCR.NaturalLanguages

Processing.OCR.NaturalLanguages, Processing.ICR.NaturalLanguages

If you modify the standard scenario, change the required modules accordingly. You also need to specify the interface languages, recognition languages and any additional features which your application uses (such as, e.g., Opening.PDF if you need to open PDF files, or Processing.OCR.CJK if you need to recognize texts in CJK languages). See Working with the FREngineDistribution.csv File for further details.

Additional optimization for specific tasks

  • Scanning
    • Scanning
      Description of the ABBYY FineReader Engine scenario for document scanning.
  • Recognition
    • PageProcessingParams Object
      This object enables customization of analysis and recognition parameters. Using this object, you can indicate which image and text characteristics must be detected (inverted image, orientation, barcodes, recognition language, recognition error margin).
    • SynthesisParamsForPage Object
      This object includes parameters responsible for restoration of a page formatting during synthesis.
    • SynthesisParamsForDocument Object
      This object enables customization of document synthesis: restoration of its structure and formatting.
    • MultiProcessingParams Object
      Simultaneous processing may be useful when processing a large number of images. In this case, the processing load will be spread over the processor cores during image opening and preprocessing, layout analysis, and recognition, which makes it possible to speed up processing.
      Reading modes (simultaneous or consecutive) are set using the MultiProcessingMode property. The RecognitionProcessesCount property controls the number of processes that may be started.
  • Searching for important information
  • Re-reading of document using special parameters for specified data type
  • Saving data

See also

Basic Usage Scenarios Implementation

11/7/2025 12:48:30 PM

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.