Using Text Type Autodetection

Autodetection detects the type of a recognized piece of text. Autodetection is launched if the TextTypes property of the RecognizerParams object is set to several constants. This mode was primarily designed for recognizing forms. In the case of common OCR, we recommend using it only if absolutely necessary.

When autodetection is on, ABBYY FineReader Engine will first try to detect the type of text in the specified block or group of blocks (for these blocks, the TextTypes property of the RecognizerParams object is set to several constants). ABBYY FineReader Engine will choose from the constants specified in the TextTypes property. This property contains an OR superposition of the TextTypeEnum enumeration constants, which denote the possible text types used for recognition. For example, if it is set to TT_Normal | TT_Index, ABBYY FineReader Engine will assume that the text contains only common typographic text and digits written in a ZIP-code style, ignoring all other variants. During autodetection, ABBYY FineReader Engine runs recognition for all of the text types specified in the TextTypes property. The OCR results are then compared, and ABBYY FineReader Engine selects the best result as the final one.

How to use autodetection

Autodetection should be used for a set of blocks, all of which contain text of the same type. If a separate text type must be selected for each block, you must call the RecognizeBlocks method for each block, and the RecognizerParams object must list the possible text types.

Note: If a single block contains the text of different types, recognition will be run for all of the text types, but only one result will be selected. So the entire text within the block will be recognized as if it was of the same type. That is why the recognition results for the block containing the text of several types may differ from the results for the block which contains the text of only one type. For better OCR results, draw separate blocks for each type of text.

Selecting the set of text types

The speed and accuracy of autodetection depend on the set of text types specified in the TextTypes property. Autodetection is fastest for combinations of TT_Normal, TT_Matrix, TT_Typewriter, TT_OCR_A, and TT_OCR_B types (which can be called the "fast autodetection set"). In this case, the recognizer is launched only once, autodetection is carried out during OCR, and single words rather than blocks are used to detect the text type. If only one text type has been specified, autodetection is not launched — the Engine launches the recognizer, which corresponds to the specified text type.

Note: If the TextTypes property is equal to any combination of TT_Matrix, TT_Typewriter, TT_OCR_A, and TT_OCR_B, then italic fonts and superscript/subscript will not be recognized, regardless of the values of the ProhibitItalic, ProhibitSubscript, and ProhibitSuperscript properties of the RecognizerParams object.

In the case of texts which are not covered by the "fast autodetection set," text types are detected by blocks, not by single words. This means that autodetection is slower if the set of possible text types includes text types other than TT_Normal, TT_Matrix, TT_Typewriter, TT_OCR_A, and TT_OCR_B. In this case, the Engine needs to carry out preliminary OCR several times — once for the types from the "fast autodetection set" and one preliminary recognition session for each additional text type. Next, the results are compared and the best text type is selected.

See also

RecognizerParams

TextTypeEnum

03.07.2024 8:50:10

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.