Pattern Training

If a document you want to recognize contains decorative fonts or special characters (e.g. mathematical symbols) that are unfamiliar to the program, we recommend using pattern training to improve OCR accuracy. Patterns are created by associating images of characters as they occur in the text with their respective keyboard counterparts. This is done by simply pressing the right key on the keyboard when the program indicates a character image it cannot recognize. Sometimes you will have to confirm the same character more than once from the keyboard, because the computer will detect finer irregularities between character images that are imperceptible to the naked eye. This is termed pattern training and effectively reinforces the association between sets of character images and their keyboard counterparts.

It is not advisable to use the training mode in other cases, as the gains in OCR quality will be insignificant compared to the time and effort you will spend on training.

Pattern training is not supported for Asian languages.

In training mode, a user pattern is created, which can be used when performing OCR on the entire text.

You may wish to edit your newly created pattern before launching the OCR process. An incorrectly trained pattern may adversely affect OCR quality. A pattern should contain only entire characters or ligatures. Characters with cut edges and characters with incorrect letter pairings should be removed from the pattern.

Creating and editing patterns

Training patterns

Some important things you should know about pattern training

  • The OCR engine makes no distinction between certain character images and associates them with one and the same keyboard character. For example, straight apostrophes ('), left single quotation marks (‘), and right single quotation marks (’) will all be associated with the straight apostrophe keyboard character. This means that left and right quotation marks will never be reproduced in recognized texts, even if you enter the respective keyboard characters in pattern training mode.
  • For some character images, the OCR engine will choose keyboard counterparts based on the bigger context. For example, the image of a circle may be either a zero or the letter O, and the OCR engine will choose between the two alternatives by looking at the neighboring characters. If the circle is surrounded by digits, the program will decide in favor of zero, otherwise it will interpret the circles as the letter O.
  • A trained pattern can only be used to recognize text printed in the same font type and size and scanned at the same resolution as the image on which the pattern was trained.
See also

Specifying Document Languages

26.03.2024 13:49:49

Please leave your feedback about this article

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.