Tips and tricks

This chapter describes several typical ways to create FlexiLayouts that would look for data fields on low-quality images. Such images are fairly common, with different scanning defects mostly caused by wrong scanning settings. For instance, an image may be too bright or too dark if brightness settings are not correct. As a result, some information on the image may be lost, or parts of the image may be noisy.

It is not always possible to rescan the documents, and the user often has to extract data from corrupted images. Moreover, some documents may have notes handwritten over useful information, which often causes recognition errors.

All the described instances of damage done to the text severely impair the quality of pre-recognition. The quality of pre-recognition may be improved by changing the recognition mode to Accurate. Unfortunately, this does not always help and greatly extends the pre-recognition time.

When a FlexiLayout is created in FlexiLayout Studio, the following method is usually used. The user can specify in the FlexiLayout that pre-recognition results may be inaccurate, i.e. differ from the source text. This fact is reflected in the standard settings of an element, for example, in the maximum number of errors in an element of type Static Text, or the percentage of non-alphabetic characters in a Character String element. High pre-recognition quality is not actually required when searching for data fields. It is required, however, when the detected fields are recognized in FlexiCapture. These programs offer specialized data types for each field, significantly improving the quality of recognition. Pre-recognition in FlexiLayout Studio is full-page OCR, and practice has shown that this is usually enough to detect the data fields on a document.

In real life projects, it is usually enough to create just a few elements to get a FlexiLayout that can successfully process good-quality images. Any user can easily create a FlexiLayout that will detect the required data fields on about 70% of the images. Such a FlexiLayout can be used in FlexiCapture. A FlexiLayout can be updated and 'taught' to extract data from low-quality images. The degree of such modification depends on the task at hand and the time available to the user.

The modification of a FlexiLayout includes detection of previously undetected elements and the attempt to find them with the help of additional elements (maybe of a different type) with less strict search constraints.

There are also other situations which require modifications to the FlexiLayout, including creation of additional elements. The user often has to process similar documents received from different sources, for example, documents created in different regional branches of a government institution. Such documents, despite their apparent likeness, may differ in the layout of data fields. In such cases it advisable to create one FlexiLayout instead of several slightly different FlexiCapture Document Definitions.

Documents may differ by the types of separators used on them or they may be filled in not only by hand but also by means of a printer. When teaching the program to find such fields, use methods described in this chapter.

A FlexiLayout Studio project which contains test images and a tested FlexiLayout can be found in %public%\ABBYY\FlexiCapture\12.0\Samples\FLS\Tips and Tricks.

Detecting dates in the case of low quality pre-recognition

Setting multiple static text values. Search for static text with similar values

Using Exclude to exclude elements

Using Group elements to optimize FlexiLayout structure and search

Searching for single-line Static Text elements

Restricting search area with RestrictSearchArea

Search for single-line fields of known or unknown format on documents of different quality

Elements search with Nearest and FuzzyQuality

Optimizing Group element search

The property "Optional" of a Group element

Digit strings search

Simplifying the FlexiLayout with an auxiliary element with a null hypothesis