English (English) - Change language

Pre-recognition

Pre-recognition is the first stage of processing a semi-structured document. Unlike fixed documents, which are designed with computer processing in mind, unstructured documents have different structure and data fields are placed in different parts of page. For this reason, pre-recognition is used to detect objects on the document which could signal the location of data fields.

Pre-recognition is the first stage of document analysis. Since pre-recognition may take considerable time, FlexiLayout Studio allows you to carry out pre-recognition once independently of FlexiLayout matching, so that you can concentrate entirely on creating and testing your FlexiLayout.

However, you need to assess the quality of the pre-recognition results before you start creating your FlexiLayout. The quality of pre-recognition depends on the quality of the test images in the batch. The quality of the test images, in its turn, depends on the scanning parameters such as brightness, contrast, and resolution. If you are not satisfied with the quality of the pre-recognition results, you may need to change the scanning options and re-scan your test documents. Note also that FlexiLayout Studio allows you to add images scanned at different resolutions, so that you can experiment with pre-recognition and FlexiLayout matching and select the optimal scanning options.

Pre-recognition can be run in fast or full mode (see Pre-recognition parameters for details). When FlexiLayout is being developed, pre-recognition need not be perfect. There is always a way of finding practically any data field even if several recognition errors have been made. Indeed, sometimes pre-recognition speed is more important than quality - the quality of recognition may be tackled at a later stage in a data capture application where you can specify data types for each field, thereby greatly improving the quality of recognition.

During pre-recognition, the program analyzes the locations of dots of various colors, detects basic objects, and merges text fragments into words and lines.

The program detects the following types of basic object:

  • Text
  • Picture
  • Punctuation mark
  • Inverted text
  • Separator
  • Barcode
  • Checkmark

Once basic objects have been detected, the program starts recognizing the text objects. Recognized text can be viewed of the following two types:

  • Recognized Words
  • Recognized Lines

More:

Pre-recognition parameters

Running pre-recognition and viewing the results

Analyzing images

9/25/2020 9:24:45 AM


Please leave your feedback about this article