Improving Recognition Quality
Recognition quality depends not only on the quality of the image (see our Source Image Recommendations) but on the recognition settings as well.
When recognizing draft dot-matrix printouts or typewritten texts, recognition quality can sometimes be improved by selecting the right text type. You can specify the text type in the TextTypes property of the RecognizerParams object. By default, the value of this property is TT_Normal, which corresponds to the common typographic text. However, you may also select a more specific type.
|An example of typewritten text. All letters are of equal width (compare, for example, "w" and "a"). Select TT_Typewriter for texts of this type.|
|An example of draft dot-matrix text. Character lines are made up of dots. Select TT_Matrix for texts of this type.|
ABBYY FineReader Engine recognizes both mono- and multi-lingual (e.g., written in several languages) documents. For multi-lingual documents, you must specify several recognition languages. English is the default recognition language. To change the default recognition language, use the SetPredefinedTextLanguage method of the RecognizerParams object.
Scanning facing pages
When scanning facing pages of a book, both pages will appear as a single image, e.g.
To improve recognition quality, split the facing pages into two separate images. You can auto split such pages using the IFRDocument::SplitPages method. You can find the position where to split the image into pages using the IFRPage::FindPageSplitPosition method.
When scanning very thick books, the text close to the binding may be distorted. The IFRPage::CorrectGeometricalDistortions method straightens out distorted lines on an image.
OCR quality may be affected by distorted text lines close to the margins, by document skew, by noise, and other defects commonly found on digital photos. A set of photo correction methods allows you to straighten out text lines, remove motion blur, and reduce noise:
- to straighten out distorted lines on an image, use the IFRPage::CorrectGeometricalDistortions method
- to remove motion blur, use the IImageDocument::RemoveMotionBlur method
- to remove noise from photos, use the IImageDocument::RemoveNoise method
If for some reason, the source image resolution is significantly different from the recommended (300 dpi for regular texts and 400-600 dpi for texts printed in small font size), or the font size is very small or very large, you can use the IFRPage::DetectResolution and the IImageDocument::ChangeResolution methods to improve recognition quality.
Fonts for document synthesis
The appearance of the output document highly depends on the set of fonts used during document synthesis. ABBYY FineReader Engine selects the best font from the set of fonts specified in the ISynthesisParamsForDocument::FontSet property. By default, the number of fonts in this set is optimized for a balance between the speed of processing and the quality of output documents. But in some cases, you may need to change the default font set:
- You can specify the FNF_FineReader filter for the fonts in the FontNamesFilter property of the SystemFontSet or CustomFontSet object. This filter allows FineReader Engine to use more fonts during document synthesis and select better fonts as compared to the default mode. However, the speed of processing may slow down. This may be useful, for example, when converting to an editable format.
- You can use a predefined font filter for a particular language, e.g., FNF_Chinese, FNF_Japanese. Use the FontNamesFilter property of the SystemFontSet or CustomFontSet object.
- You can specify particular font families used in your document in the FontNamesCustomFilter property of the SystemFontSet or CustomFontSet object.