Creating a New Recognition Language

Verification Station uses data about the document language when recognizing text. The program may fail to recognize some characters in documents with uncommon elements (e.g. code numbers) because the document language might not contain these characters. To recognize such documents, you can create a custom language that has all of the necessary characters. You can also create groups of several OCR languages and use these groups when recognizing documents.

How to create a user language

  1. On the Tools menu, click Language Editor….
  2. Click the New... button.
  3. In the dialog box that opens, select the Create a new language based on an existing one option, select the language which you want to use as a base for the new language, and click OK.
  4. The Language Properties dialog box will open. In this dialog box:
    1. Type a name for your new language.
    2. The language you selected in the New Language or Group dialog box is displayed in the Source language drop-down list. You can select a different language from this drop-down list.
    3. The Alphabet contains the alphabet of the base language. If you want to edit the alphabet, click the button.
    4. The Dictionary option group contains several options for the dictionary that will be used by the program when recognizing text and checking the result:
      • None
        The language will not have a dictionary.
      • Built-in dictionary
        The program's built-in dictionary will be used.
      • User dictionary
        Click the Edit... button to specify dictionary terms or import an existing custom dictionary or a text file with Windows-1252 encoding (terms must be separated by spaces or other characters that are not in the alphabet).
        Words from the user dictionary will not be marked as misspelled when the spelling in the recognized text is checked. They may be written in all lower-case or all upper-case letters, or may begin with an upper-case letter.
Word in the dictionary Words that will not be considered misspelled during a spelling check
abc abc, Abc, ABC
Abc abc, Abc, ABC
ABC abc, Abc, ABC
Abc aBc, abc, Abc, ABC
  • The Regular expression allows you to create a user dictionary using regular expressions.
    See also: Regular expressions.
  1. Languages can have several additional properties. To change these properties, click the Advanced... button to open the Advanced Language Properties dialog box, where you can specify:
  • Non-letter characters that may occur at the beginning of words
  • Non-letter characters that may occur at the end of words
  • Standalone non-letter characters (punctuation marks, etc.)
  • Characters to be ignored if they occur inside words
  • Prohibited characters that may never occur in texts written in this language
  • The Text may contain Arabic numerals, Roman numerals, and abbreviations option
  • All the characters of the language that will be recognized.
  1. You can now select the newly created language when choosing OCR languages.

You can save all user patterns and languages as a single file. To do this, click Tools > Save Patterns and Languages....

See also

Document Language

26.03.2024 13:49:49

Please leave your feedback about this article

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.