If Your Printed Document Contains Non-Standard Fonts

Download

The Training mode improves OCR quality on documents with decorative fonts or documents containing special characters (e.g. mathematical symbols).

Note: It is not advisable to use the training mode in other cases, as the gains in recognition quality will be insignificant compared to the time and effort you will spend on training.

In Training mode, a user pattern is created, which can be used when performing OCR on the entire text.

Using user patterns

To use a pattern to recognize a document:

  1. Open the Options dialog box (Tools > Options…) and then click the Read tab.
  2. Under Training, select the Use only user pattern option.

Note: If you select Use built-in and user patterns, ABBYY FineReader 12 will use both the user patterns and its factory preset patterns for OCR.

  1. Click the Pattern Editor… button.
  2. In the Pattern Editor dialog box, select the desired pattern and then click OK.
  3. In the ABBYY FineReader main window, click the Read button.

Creating and training a user pattern

To train a user pattern to recognize new characters and ligatures:

  1. Open the Options dialog box (Tools > Options…) and then click the Read tab.
  2. Under Training, select Use built-in and user patterns or Use only user pattern.
  3. Select the Read with training option.
  4. Click the Pattern Editor… button.

Note: Pattern training is not supported for Asian languages.

  1. In the Pattern Editor dialog box, click New…
  2. The Create Pattern dialog box will open. Type the name of the user pattern and click OK.
  3. Close the Pattern Editor and the Options dialog box by clicking the OK button in each.
  4. On the toolbar at the top of the Image window, click Read.

Now if ABBYY FineReader encounters an unknown character, this character will be displayed in a Pattern Training dialog box.

  1. Teach the program to read new characters and ligatures.

A ligature is a combination of two or three characters that are "glued together" (for example, fi, fl, ffi, etc.) and are difficult for the program to separate. In fact, better results can be obtained by treating them as single compound characters.

Note: Words printed in bold or italic type in your text or words in superscript/subscript may be retained in the recognized text by selecting the corresponding options under Effects.

To go back to a previously trained character, click the Back button. The frame will jump to its previous location and the latest trained "character image - keyboard character" correspondence will be removed from the pattern. The Back button navigates between characters of one word and will not navigate between words.

Important!

  • You can only train ABBYY FineReader to read the characters included in the alphabet of the recognition language. To train the program to read characters that cannot be entered from the keyboard, use a combination of two characters to denote these non-existent characters or copy the desired character from the Insert Character dialog box (click   to open the dialog box).
  • Each pattern may contain up to 1,000 new characters. However, avoid creating too many ligatures, as this may adversely affect OCR quality.

Selecting a user pattern

ABBYY FineReader allows you to use patterns to improve OCR quality

  1. On the Tools menu, click Pattern Editor….
  2. In the Pattern Editor dialog box, select the desired pattern from the list of available patterns and click Set Active.

Some important points to remember:

  1. Rather than differentiating between some similar yet different characters, ABBYY FineReader recognizes them as one and the same character. For example, the straight ('), left (‘), and right (’) quotes will be stored in a pattern as a single character (straight quote). This means that left and right quotes will never be used in the recognized text, even if you try to train them.
  2. For some character images, ABBYY FineReader will select the corresponding keyboard character based on the surrounding context. For example, an image of a small circle will be recognized as the letter O if there are letters immediately next to it and as the number 0 if there are digits next to it.
  3. A pattern can only be used for documents that have the same font, font size, and resolution as the document used to create the pattern.
  4. To be able to use a pattern later, save it to a file. See "What Is a FineReader Document?" for details.
  5. To recognize texts set in a different font, be sure to disable the user pattern by selecting the Use only built-in patterns option in Tools > Options… > Read.

Editing a user pattern

You may wish to edit your newly created pattern before launching the OCR process. An incorrectly trained pattern may adversely affect OCR quality. A pattern should contain only whole characters or ligatures. Characters with cut edges and characters with incorrect letter correspondences should be removed from the pattern.

  1. On the Tools menu, click Pattern Editor….
  2. The Pattern Editor dialog box will open. Select the desired pattern and click the Edit… button.
  3. In the User Pattern dialog box that opens, select a character and click the Properties… button.

In the dialog box that opens:

  • Enter the letter that corresponds to the character in the Character field
  • Specify the desired font effect (bold, italic, superscript and subscript) in the Effect field.

Click the Delete button in the User Pattern dialog box to delete a character that has been trained incorrectly.

14.01.2020 17:26:19

Please leave your feedback about this article

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.