Working with Complex-Script Languages

Download

With ABBYY FineReader, you can recognize documents in Arabic, Hebrew, Yiddish, Thai, Chinese, Japanese, and Korean. Some additional considerations must be taken into account when working with documents in Chinese, Japanese or Korean and documents in which a combination of CJK and European languages is used.

Installing language support

To be able to recognize texts written in Arabic, Hebrew, Yiddish, Thai, Chinese, Japanese, and Korean, you may need to install these languages.

Microsoft Windows 8, Windows 7, and Windows Vista support these languages by default.

To install new languages in Microsoft Windows XP:

  1. Click Start on the taskbar.
  2. Click Control Panel > Regional and Language Options.
  3. Click the Languages tab and select the following options:
    • Install files for complex script and right-to-left languages (including Thai)

to enable support for Arabic, Hebrew, Yiddish, and Thai

  • Install files for East Asian languages

to enable support for Japanese, Chinese, and Korean

  1. Click OK.

Recommended fonts

Recognition of text in Arabic, Hebrew, Yiddish, Thai, Chinese, Japanese, and Korean may require the installation of additional fonts in Windows. The table below lists the recommended fonts for texts in these languages.

OCR Language Recommended font
Arabic Arial™ Unicode™ MS*
Hebrew Arial™ Unicode™ MS*
Yiddish Arial™ Unicode™ MS*
Thai

Arial™ Unicode™ MS*

Aharoni

David

Levenim mt

Miriam

Narkisim

Rod

Chinese (Simplified),

Chinese (Traditional),

Japanese, Korean,

Korean (Hangul)

Arial™ Unicode™ MS*

SimSun fonts

such as: SimSun (Founder Extended), SimSun-18030, NSimSun.

Simhei

YouYuan

PMingLiU

MingLiU

Ming(for-ISO10646)

STSong

* This font is installed together with Microsoft Windows XP and Microsoft Office 2000 or later.

The sections below contain advice on improving recognition accuracy.

Disabling automatic processing

By default, any pages you add to a FineReader document are automatically recognized.

However, if your document contains text in a CJK language combined with a European language, we recommend disabling automatic page orientation detection and using the dual page splitting option only if all of the page images have the correct orientation (e.g., they were not scanned upside down).

The Detect page orientation and Split facing pages options can be enabled and disabled on the Scan/Open tab of the Options dialog box.

Note: To split facing pages in Arabic, Hebrew, or Yiddish, be sure to select the corresponding recognition language first and only then select the Split facing pages option. This will ensure that the pages are arranged in the correct order. You can also restore the original page numbering by selecting the Swap book pages option. For details, see "What Is a FineReader Document?"

If your document has a complex structure, we recommend disabling automatic analysis and OCR for images and performing these operations manually.

To disable automatic analysis and OCR:

  1. Open the Options dialog box (Tools > Options…).
  2. Clear the Automatically process pages as they are added option on the Scan/Open tab.
  3. Click OK.

Recognizing documents written in more than one language

The instructions below are provided as an example and explain how to recognize a document that contains both English and Chinese text. Documents that contain other languages can be recognized in a similar manner.

  1. On the main toolbar, select More languages… from the Document Languages drop-down list. Select Specify languages manually from the Language Editor dialog box and select Chinese and English from the language list.
  2. Scan or open the images.
  3. If the program fails to detect all of the areas on an image:
    • Specify areas manually using area editing tools.
    • Specify any areas that only contain one language. To do so, select them and specify their language in the Area Properties pane.

Important! The language can only be specified for areas of the same type. If you selected areas of different types, such as Text and Table, you will not be able to specify a language.

  1. Click the Read button on the main toolbar.

If non-European characters are not displayed in the Text window

If text in a CJK language is displayed incorrectly in the Text window, you may have selected the Plain text mode.

To change the font used in Plain text mode:

  1. Open the Options options dialog box (Tools > Options…).
  2. Click the View tab.
  3. Select Arial Unicode MS from the Font used to display plain text drop-down list.
  4. Click OK.

If this did not help and text in the Text window is still displayed incorrectly, see "Incorrect Font Is Used or Some Characters Are Replaced with "?" or "□"."

Changing the direction of recognized text

ABBYY FineReader detects text direction automatically, but you can also specify text direction manually.

  1. Select one or more paragraphs in the Text window.
  2. Click the button on the toolbar of the Text window.

Note: You can use the Direction of CJK text drop-down list in the Image window to specify the direction of text prior to recognition. See "If Vertical or Inverted Text Is Not Recognized" for details.

14.01.2020 17:26:19

Please leave your feedback about this article

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.