If a document you want to recognize contains decorative fonts or special characters (e.g. mathematical symbols) that are unfamiliar to the program, we recommend using pattern training to improve OCR accuracy. Patterns are created by associating images of characters as they occur in the text with their respective keyboard counterparts. This is done by simply pressing the right key on the keyboard when the program indicates a character image it cannot recognize. Sometimes you will have to confirm the same character more than once from the keyboard, because the computer will detect finer irregularities between character images that are imperceptible to the naked eye. This is termed pattern training and effectively reinforces the association between sets of character images and their keyboard counterparts.
It is not advisable to use the training mode in other cases, as the gains in OCR quality will be insignificant compared to the time and effort you will spend on training.
Pattern training is not supported for Asian languages.
In training mode, a user pattern is created, which can be used when performing OCR on the entire text.
You may wish to edit your newly created pattern before launching the OCR process. An incorrectly trained pattern may adversely affect OCR quality. A pattern should contain only entire characters or ligatures. Characters with cut edges and characters with incorrect letter pairings should be removed from the pattern.
Creating and editing patterns
In the Pattern Editor (Tools > Pattern Editor..., you can create a new pattern, select a pattern to be used for OCR, or edit and delete existing patterns. To do this:
- To create a new pattern, click New.... and enter a name for your pattern in the Pattern Editor... dialog box. Press OK. Now you can train the program to recognize character images that are new to it.
- To edit an existing pattern:
- In the Pattern Editor... dialog box, select the pattern you want to edit and click Edit....
- In the User Pattern dialog box, select a character and click Properties....
In the Properties dialog box that opens:
- In the Character field, type the keyboard character that corresponds to the character image currently highlighted on the screen.
- If you want to preserve the text effects in the recognized text, select the required text effect (i.e. italic, bold, superscript or subscript) in the Effects group of options.
- To rename a pattern, select it in the list, click Rename..., and enter a new name in the Pattern name field.
- To make a pattern active, select it in the list in the User Pattern dialog box and click Set Active.
- To delete a pattern, select it in the list in the User Pattern dialog box and click Delete.
To train a user pattern to recognize new characters and ligatures:
- Click Tools > Pattern Training).
- When the program encounters a character or ligature it cannot recognize, the Pattern Training dialog box will pop up showing the image of the unknown character or ligature.
- Press the corresponding character or character sequence on the keyboard.
- Ligatures are sequences of two or three characters that are printed so close together that they appear as one character to the program.
- If you want to preserve the text effects, select the required text effect (i.e. italic, bold, superscript or subscript) in the Effects group of options.
- As training progresses from one unrecognized character to the next, you can go back to the previously trained character image by clicking the Back button. The enclosing frame will shift onto the previous character image and the latest "character image/keyboard character" association will be discarded. The Back button only operates within the current word and won't go further back than its first letter.
Important! Patterns can only be trained for characters from the alphabet used in the text. If a ligature or character has no corresponding key on the keyboard, you can either consecutively press the two keys that together will make up the required ligature, or click the button and select the required character in the Insert Character dialog box.
Important! One pattern may contain up to 1000 new characters. Note, however, that having too many ligatures in a pattern may adversely affect the quality of OCR.
Some important things you should know about pattern training
- The OCR engine makes no distinction between certain character images and associates them with one and the same keyboard character. For example, straight apostrophes ('), left single quotation marks (‘), and right single quotation marks (’) will all be associated with the straight apostrophe keyboard character. This means that left and right quotation marks will never be reproduced in recognized texts, even if you enter the respective keyboard characters in pattern training mode.
- For some character images, the OCR engine will choose keyboard counterparts based on the bigger context. For example, the image of a circle may be either a zero or the letter O, and the OCR engine will choose between the two alternatives by looking at the neighboring characters. If the circle is surrounded by digits, the program will decide in favor of zero, otherwise it will interpret the circles as the letter O.
- A trained pattern can only be used to recognize text printed in the same font type and size and scanned at the same resolution as the image on which the pattern was trained.