If your printed document contains non-standard fonts

If a document you want to recognize contains decorative fonts or special characters (e.g. mathematical symbols), we recommend using the training mode to improve OCR accuracy.

It is not advisable to use the training mode in other cases, as the gains in OCR quality will be insignificant compared to the time and effort you will spend on training.

In training mode, a user pattern is created, which can be used when performing OCR on the entire text.

Using user patterns

To use a user pattern to recognize a document:

  1. Click Tools > Options... to open the Options dialog box and click the Recognition Languages tab.
  2. Select the Use user patterns option.
    If the Also use built-in patterns option underneath the Use user patterns option is selected, ABBYY FineReader will use its built-in patterns in addition to any user patterns you create.
  3. Click the Pattern Editor... button.
  4. In the Pattern Editor dialog box, select a pattern and click OK.
  5. Click the button on the main toolbar at the top of the OCR Editor window.

Creating and training a user pattern

To train a user pattern to recognize new characters and ligatures:

  1. Click Tools > Options... to open the Options dialog box and click the Recognition Languages tab.
  2. Select the Use training to recognize new characters and ligatures option.
    If the Also use built-in patterns option underneath the Use training to recognize new characters and ligatures option is selected, ABBYY FineReader will use its built-in patterns in addition to any user patterns you create.
  3. Click the Pattern Editor... button.
    Pattern training is not supported for Asian languages.
  4. In the Pattern Editor dialog box, click the New... button.
  5. In the Create Pattern dialog box, specify a name for the new pattern and click OK.
  6. Click OK in the Pattern Editor dialog box and then click OK in the Options dialog box.
  7. Click the button in the toolbar at the top of the Image pane.
    If the program encounters a character it does not recognize, the Pattern Training dialog will open and display this character.
  8. Teach the program to read new characters and ligatures.
    A ligature is a combination of two or three characters that are "glued together" (for example, fi, fl, ffi, etc.) and are difficult for the program to separate. In fact, better results can be obtained by treating them as single compound characters.
    Words printed in bold or italic type or words in superscript/subscript may be retained in the recognized text by selecting the corresponding options under Effects.
    To go back to a previously trained character, click the Back button. The frame will jump to its previous location and the latest trained "character image - keyboard character" pairing will be removed from the pattern. The Back button navigates between the characters of one word and will not navigate between words.

Important!

  • You can only train ABBYY FineReader 14 to read the characters included in the alphabet of the OCR language. To train the program to read characters that cannot be entered from the keyboard, use a combination of two characters to denote these non-existent characters or copy the desired character from the Insert Character dialog box (click  to open this dialog box).
  • Each pattern may contain up to 1,000 new characters. However, avoid creating too many ligatures, as this may adversely affect OCR quality.

Selecting a user pattern

ABBYY FineReader lets you use patterns to improve OCR quality.

  1. Click Tools > Pattern Editor....
  2. In the Pattern Editor dialog box, select one of the patterns in the list and click the Set Active button.

Some important points to remember:

  1. Sometimes the program will not differentiate between very similar yet different characters and recognize them as one and the same character. For example, the straight ('), left (‘), and right (’) quotes will be stored in a pattern as a single character (straight quote). This means that left and right quotes will never be used in the recognized text, even if you try to train them.
  2. For some character images, ABBYY FineReader 14 will select the corresponding keyboard character based on the surrounding context. For example, an image of a small circle will be recognized as the letter O if there are letters immediately next to it, and as the number 0 if there are digits next to it.
  3. A pattern can only be used for documents that have the same font, font size, and resolution as the document used to create the pattern.
  4. You can save your pattern to a file and use it in other OCR projects. See also: OCR projects.
  5. To recognize texts set in a different font, be sure to disable the user pattern. To do this, click Tools > Options... to open the Options dialog box, click the Recognition Languages tab, and select the Use built-in patterns option.

Editing a user pattern

You may wish to edit your newly created pattern before launching the OCR process. An incorrectly trained pattern may adversely affect OCR quality. A pattern should contain only entire characters or ligatures. Characters with cut edges and characters with incorrect letter pairings should be removed from the pattern.

  1. Click Tools > Pattern Editor....
  2. In the Pattern Editor dialog box, select the pattern you want to edit and click the Edit... button.
  3. In the User Pattern dialog box, select a character and click the Properties... button.

In the dialog box that opens:

  • In the Character field, enter the letter that corresponds to the character.
  • In the Effects field, specify the desired font effect (bold, italic, superscript or subscript).

To delete a character that has been trained incorrectly, click the Delete button in the User Pattern dialog box.

02.11.2018 16:19:18

Please leave your feedback about this article

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.