English (English) - Change language

Pre-recognition parameters

FlexiLayouts and classifiers have a variety of user-defined settings, including pre-recognition settings such as recognition languages, text type, pre-recognition modes and areas. Selecting the right pre-recognition settings will help you create FlexiLayouts and Classifiers that are well-suited for processing your documents.

You can change pre-recognition settings in the Pre-recognition Properties dialog box. To open this dialog box:

  • Click Properties... on the FlexiLayout or Classifier menu or on the shortcut menu of the FlexiLayout or Classifier.
  • Click the Advanced Pre-recognition Properties... button on the General tab of the Properties of %Name% dialog box.

The Pre-recognition Properties dialog box will open. The options available in this dialog box are listed below.

Option Description
General tab

The method that was used for printing the text on the documents:

  • Typographic,
  • Matrix printer,
  • Typewriter.

Determine the type of text and evaluate its quality before selecting these options.

Pre-recognition mode

Three pre-recognition modes are available: Fast, Balanced and Thorough.

In the fast mode, color and grayscale images are first binarized (i.e. converted to black and white). Fast mode saves time and produces good results for most documents. In the balanced mode, color-sensitive recognition is applied. The balanced mode is slower than the fast mode, but it provides better recognition quality. The through mode is recommended if poor quality documents are used and other modes result in too many errors.

By default the balanced mode is selected.

Languages tab
Text languages The languages used in the documents. You can select one or several languages from the drop-down list. For the full list of available languages, see OCR languages supported in ABBYY FlexiLayout™ Studio.
User dictionaries This group of options lets you add user dictionaries. User dictionaries are used to improve recognition quality by supplementing built-in dictionaries with specialized vocabulary, abbreviations, company names, etc.
Advanced tab

This group contains two barcode processing options:

  • Disable barcode extraction – Select this option if barcodes should not be found on your images. This will speed up document recognition considerably.
  • Extract 2D barcodes: Data Matrix, Aztec, QR Code – Select this option if the images you need to process contain Data Matrix, Aztec and QR Code barcodes. If this option is not enabled, these barcodes will not be detected by the program on images, and will not be available in the Barcode element's properties.
  • Extract post barcodes - Select this option if your images contain postal barcodes, e.g. Australia Post. If this option is not selected, postal barcodes will not be found on images and will not be available in the Barcode element properties.
    Important! Extracting postal barcodes slows down recognition.

Contains options for processing CJK (Chinese, Japanese, and Korean) languages.

  • Separated furigana mode – Select this option to improve recognition quality when processing Japanese text with furigana (pronunciation aids).
NER recognition

Extract named entities – Select this option to extract meaningful information from a field or field group using NLP methods.

Note. This option is only available for licenses that include an NLP module.

Vertical text extraction

Vertical text extraction parameters:

  • Extract for all languages – Detects vertically-oriented text written in any of the supported languages.
  • Do not extract – Prevents the detection of vertically-oriented text.
  • Extract for CJK languages – Detects vertical text written in Chinese, Japanese or Korean.
Pre-recognition area The area to be pre-recognized. You can specify position of pre-recognition area relative to page edges.
User pattern This option allows you to add a user pattern created in ABBYY FineReader Professional/Corporate Edition 9.0. We recommend using these user patterns if your documents contain non-standard fonts and characters.

4/13/2021 11:12:27 AM

Please leave your feedback about this article