If the program fails to recognize certain characters

ABBYY FineReader PDF uses data about the document language when recognizing text. The program may fail to recognize some characters in documents with uncommon elements (e.g. code numbers) because the document language might not contain these characters. To recognize such documents, you can create a custom language that has all of the necessary characters. You can also create groups of several OCR languages and use these groups when recognizing documents.

How to create a user language

  1. Open the Options dialog box (click Tools > Options... to open this dialog box), click the Languages tab.
  2. Click the New... button.
  3. In the dialog box that opens, select the Create a new language based on an existing one option, select the language which you want to use as a base for the new language, and click OK.
  4. The Language Properties dialog box will open. In this dialog box:
    1. Type a name for your new language.
    2. The language you selected in the New Language or Group dialog box is displayed in the Source language drop-down list. You can select a different language from this drop-down list.
    3. The Alphabet contains the alphabet of the base language. If you want to edit the alphabet, click the button.
    4. The Dictionary option group contains several options for the dictionary that will be used by the program when recognizing text and checking the result:
      • None
        The language will not have a dictionary.
      • Built-in dictionary
        The program's built-in dictionary will be used.
      • User dictionary
        Click the Edit... button to specify dictionary terms or import an existing custom dictionary or a text file with Windows-1252 encoding (terms must be separated by spaces or other characters that are not in the alphabet).
        Words from the user dictionary will not be marked as misspelled when the spelling in the recognized text is checked. They may be written in all lower-case or all upper-case letters, or may begin with an upper-case letter.
Word in the dictionary Words that will not be considered misspelled during a spelling check
abc abc, Abc, ABC
Abc abc, Abc, ABC
ABC abc, Abc, ABC
Abc aBc, abc, Abc, ABC
  • The Regular expression allows you to create a user dictionary using regular expressions.
    See also: Regular expressions.
  1. Languages can have several additional properties. To change these properties, click the Advanced... button to open the Advanced Language Properties dialog box, where you can specify:
    • Characters that can begin or end a word
    • Non-letter characters that appear separately from words
    • Characters that may appear inside words but should be ignored
    • Characters that cannot appear in texts that are recognized using this language (prohibited characters)
    • The Text may contain Arabic numerals, Roman numerals, and abbreviations option
  1. You can now select the newly created language when choosing OCR languages.
    For more on OCR languages, see OCR languages.

By default, the user language is saved in the folder of the OCR project. You can also save all user patterns and languages as a single file. To do this, open the Options dialog box (click Tools > Options... to open this dialog box), click the OCR tab, and then click the Save Patterns and Languages... button.

Creating a language group

If you are going to use a particular language combination regularly, you may wish to group the languages together for convenience.

  1. Open the Options dialog box (click Tools > Options... to open this dialog box) and click the Languages tab.
  2. Click the New... button.
  3. In the New Language or Group dialog box, select the Create a new group of languages option, and click OK.
  4. The Language Group Properties dialog box will open. In this dialog box, specify a name for the language group and select the languages you want to include in the group.
    If you know that your text will not contain certain characters, you may wish to explicitly specify these so-called prohibited characters. Doing this can increase the speed and accuracy of OCR. To specify these characters, click the Advanced... button in the Language Group Properties dialog box and enter the prohibited characters in the Prohibited characters field.
  5. Click OK.

The new group will appear in the drop-down list of languages on the main toolbar.

By default, user language groups are saved in the folder of the OCR project. You can also save all user patterns and languages as a single file. To do this, open the Options dialog box (click Tools > Options... to open this dialog box), click the OCR tab, and then click the Save Patterns and Languages... button.

Tip. You can use the drop-down list of languages on the main toolbar to select a language group.

  1. Select More languages... from the drop-down list of languages on the main toolbar.
  2. In the Language Editor dialog box, select the Specify OCR languages manually option.
  3. Select the desired languages and click OK.

6/12/2024 2:29:42 PM

Please leave your feedback about this article

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.