Working with complex-script languages

With ABBYY FineReader, you can recognize documents in Arabic, Hebrew, Yiddish, Thai, Chinese, Japanese, and Korean. Some additional factors must be taken into account when working with documents in Chinese, Japanese or Korean and with documents in which a combination of CJK and European languages is used.

Recommended fonts

Recognition of text in Arabic, Hebrew, Yiddish, Thai, Chinese, Japanese, and Korean may require additional fonts to be installed. The table below lists the recommended fonts for texts in these languages.

OCR languages Recommended font
Arabic Arial™ Unicode™ MS
Hebrew Arial™ Unicode™ MS
Yiddish Arial™ Unicode™ MS
Thai

Arial™ Unicode™ MS

Aharoni

David

Levenim mt

Miriam

Narkisim

Rod

Chinese (Simplified)

Chinese (Traditional)

Japanese, Korean

Korean (Hangul)

Arial™ Unicode™ MS

SimSun fonts such as:

Example SimSun (Founder Extended),

SimSun-18030, NSimSun.

Simhei

YouYuan

PMingLiU

MingLiU

Ming(for-ISO10646)

STSong

The sections below contain advice on improving recognition accuracy.

Disabling automatic image processing

By default, any pages you add to an OCR project are automatically recognized.

However, if your document contains text in a CJK language combined with a European language, we recommend disabling automatic detection of page orientation and using the dual page splitting option only if all of the page images have the correct orientation (e.g. they were not scanned upside down).

You can enable/disable the Correct page orientation and Split facing pages options on the  Image Processing tab of the Options dialog box (click Tools > Options... to open this dialog box).

To split facing pages in Arabic, Hebrew, or Yiddish, be sure to select the corresponding OCR language first and only then select the Split facing pages option. You can also restore the original page numbering by selecting the Swap book pages option. See also: OCR projects .

If your document has a complex structure, we recommend disabling automatic analysis and OCR for images and performing these operations manually.

You can turn off automatic analysis and OCR of newly added images on the Image Processing tab of the Options dialog box (click Tools > Options... to open this dialog box).

  1. Click Tools > Options... to open the Options dialog box.
  2. On the Image Processing tab, clear the Automatically process page images as they are added to the OCR Editor option.
  3. Click OK.

Recognizing documents written in more than one language

The instructions below are provided as an example and explain how to recognize a document that contains both English and Chinese text. Documents that contain other languages can be recognized in a similar manner.

  1. On the main toolbar, select More languages... from the list of languages. In the Language Editor dialog box, select Specify OCR languages manually and select Chinese and English from the list of languages.
  2. Scan your pages or open your images.
  3. If the program fails to detect all of the areas on an image:
    • Specify areas manually using the area editing tools
    • Specify any areas that only contain one language and on the Area Properties select English or Chinese as appropriate.
      A language can only be specified for areas of the same type. If you selected areas of different types, such as Text and Table, you will not be able to specify a language.
    • If necessary, select the text direction from the Orientation drop-down list (for details, see If vertical or inverted text was not recognized)
    • For texts in CJK languages, the program provides a selection of the text directions in the Direction of CJK text drop-down list (for details, see Editing area properties).

If non-European characters are not displayed in the Text pane

If text in a CJK language is displayed incorrectly in the Text pane, you may have selected the Plain text mode.

To change the font used in Plain text mode:

  1. Click Tools > Options... to open the Options dialog box.
  2. Click the Areas and Text tab.
  3. Select Arial Unicode MS from the Font used to display plain text drop-down list.
  4. Click OK.

If this did not help and text in the Text window is still displayed incorrectly, see Incorrect font is used or some characters are replaced with "?" or "□".

Changing the direction of recognized text

ABBYY FineReader detects text direction automatically, but you can also specify text direction manually.

  1. Activate the Text pane.
  2. Select one or more paragraphs.
  3. Click the button on the toolbar in the Text pane.

You can use the Direction of CJK text drop-down list in the Image pane to specify the direction of text prior to OCR. See also: Editing area properties.

02.11.2018 16:19:18


Please leave your feedback about this article