Document features to consider prior to OCR

The quality of images has a significant impact on OCR quality. This section explains what factors you should take into account before recognizing images.

OCR languages

ABBYY FineReader can recognize both single- and multi-language documents (e.g. written in two or more languages). For multi-language documents, you need to select several OCR languages.

To select OCR languages, click Options >  Languages and select one of the following options:

  • Automatically select OCR languages from the following list
    ABBYY FineReader will automatically select the appropriate languages from the user-defined list of languages. To edit the list of languages:
    1. Make sure the Automatically select OCR languages from the following list option is selected.
    2. Click the Specify... button.
    3. In the Languages dialog box, select the desired languages and click OK.
    4. In the Options dialog box, click OK.
  • Specify OCR languages manually
    Select this option if the language you need is not in the list.

In the dialog box below, specify one or more languages. If you often use a particular language combination, you can create a new group for these languages.

If a language is not in the list, it is either:

  1. Not supported by ABBYY FineReader, or
    For a complete list of supported languages, see Supported OCR languages.
  2. Not supported by your version of the product.
    The complete list of languages available in your version of the product can be found in the Licenses dialog box (click Help > About > License Info to open this dialog box).

In addition to using built-in languages and language groups, you can create your own languages and groups.See also: If the program fails to recognize certain characters.

Print type

Documents may be printed using various devices such as typewriters and fax machines. OCR quality may vary depending on how a document was printed. You can improve OCR quality by selecting the correct print type in the Options dialog box.

For most documents, the program will detect their print type automatically. For automatic print type detection, the Auto option must be selected in theDocument type group of options in the Options dialog box (clickTools > Options...>Recognition Languages to access these options). You can process documents in full-color or black-and-white mode.

You may also choose to manually select the print type as needed.

An example of typewritten text. All letters are of equal width (compare, for example, "w" and "t"). For texts of this type, select Typewriter.
An example of a text produced by a fax machine. As you can see from the example, the letters are not clear in some places. There is also some noise and distortion. For texts of this type, select Fax.

After recognizing typewritten texts or faxes, be sure to select Auto before processing regular printed documents.

Print quality

Poor-quality documents with "noise" (i.e. random black dots or speckles), blurred and uneven letters, or skewed lines and shifted table borders may require specific scanning settings.

Fax Newspaper

Poor-quality documents are best scanned in grayscale. When scanning in grayscale, the program will select the optimal brightness value automatically.

The grayscale scanning mode retains more information about the letters in the scanned text to achieve better OCR results when recognizing documents of medium to poor quality. You can also correct some of the defects manually using the image editing tools available in the Image Editor. See also: If your document image has defects and OCR accuracy is low.

Color mode

If you do not need to preserve the original colors of a full-color document, you can process the document in black-and-white mode. This will greatly reduce the size of the resulting OCR project and speed up the OCR process. However, processing low-contrast images in black-and-white may result in poor OCR quality. We also do not recommend black-and-white processing for photos, magazine pages, and texts in Chinese, Japanese, and Korean.

Tip. You can also speed up the OCR of color and black-and-white documents by selecting Fast recognition on the Recognition Languages tab of the Options dialog box. For more about the recognition modes, see OCR Options.

For some additional recommendations on selecting the right color mode, see Scanning tips.

Once the document is converted to black-and-white, you will not be able to restore the colors. To get a color document, open a file with color images or scan the paper document in color mode.

02.11.2018 16:19:18

Please leave your feedback about this article