Document Features to Consider Prior to OCR

Download

The quality of images has a significant impact on recognition quality. This section explains what factors you should take into account before recognizing images.

Document languages

ABBYY FineReader recognizes both single- and multi-language documents (e.g. written in two or more languages). For multi-language documents, you need to select several recognition languages.

To specify an OCR language for your document, in the Document Language drop-down list on the main toolbar or in the Task window, select one of the following:

  • Autoselect

ABBYY FineReader will automatically select the appropriate languages from the user-defined list of languages. To modify this list:

  1. Select More languages…
  2. In the Language Editor dialog box, select the Automatically select document languages from the following list option.
  3. Click the Specify… button.
  4. In the Languages dialog box, select the desired languages.
  • A language or a combination of languages

Select a language or a language combination. The list of languages includes recently used recognition languages, as well as English, German, and French.

  • More languages…

Select this option if the language you need is not visible in the list.

In the Language Editor dialog box, select the Specify languages manually option and then select the desired language or languages by checking the appropriate boxes. If you often use a particular language combination, you can create a new group for these languages.

If a language is not in the list, it is either:

  1. not supported by ABBYY FineReader, or

For a complete list of supported languages, see "Supported Languages."

  1. not supported by your copy of the software.

The complete list of languages available in your copy can be found in the Licenses dialog box (Help > About… > License Info).

In addition to using built-in languages and language groups, you can create your own. For details, see "If the Program Fails to Recognize Some of the Characters."

Print type

Documents may be printed on various devices such as typewriters and fax machines. OCR quality can be improved by selecting the correct Document type in the Options dialog box.

For most documents, the program will detect the print type automatically. For automatic print type detection, the Auto option must be selected under Document type in the Options dialog box (Tools > Options…). You can process the document in full-color or black-and-white mode.

You may also choose to manually select the print type as needed.

An example of typewritten text. All letters are of equal width (compare, for example, "w" and "t"). For texts of this type, select Typewriter.
An example of a text produced by a fax machine. As you can see from the example, the letters are not clear in some places, in addition to noise and distortion. For texts of this type, select Fax.

Tip: After recognizing typewritten texts or faxes, be sure to select Auto before processing regular printed documents.

Print quality

Poor-quality documents with "noise" (i.e. random black dots or speckles), blurred and uneven letters, or skewed lines and shifted table borders may require specific scanning settings.

Fax Newspaper

Poor-quality documents are best scanned in grayscale. When scanning in grayscale, the program will select the optimal brightness value automatically.

The grayscale scanning mode retains more information about the letters in the scanned text to achieve better OCR results when recognizing documents of medium to poor quality. You can also correct some of the defects manually using the image editing tools available in the Image Editor. For details, see "Image Preprocessing."

Color mode

If you do not need to preserve the original colors of a full-color document, you can process the document in black-and-white mode. This will greatly reduce the size of the resulting FineReader document and speed up the OCR process. However, processing low-contrast images in black and white may result in poor OCR quality. We also do not recommend black and white processing for photos, magazine pages, and texts in Chinese, Japanese, and Korean.

Note: You can also speed up recognition of color and black-and-white documents by selecting the Fast reading option on the Read tab of the Options dialog box. For more about the recognition modes, see OCR Options.

To select a color mode:

  • Use the Color mode drop-down list in the Task dialog box or
  • Select one of the options under Color mode on the Document tab of the Options dialog box (Tools > Options…).

Important! Once the document is converted to black-and-white, you will not be able to restore the colors. To get a color document, open the file with color images or scan the paper document in color mode.

1/14/2020 5:26:19 PM

Please leave your feedback about this article

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.