English (English) - Change language

Character String

Character String is an element of a FlexiLayout which describes a string of characters written in one line from left to right. Character strings may consist of words or parts of words.

Character String elements are marked with in the FlexiLayout tree.

Character String elements are used to look for unspecified text. The program will consider as candidates the Recognized Words objects detected during pre-recognition in the element’s search area.

Usually, character strings are located next to static text. When looking for the Ref. No. of a document, for example, the program must first find the static text "Ref. No." and then look for digits next to it.

Describing the search text

Click the Character String tab in the Properties dialog box to describe the corresponding object. To open the Properties dialog box, right-click the element in the FlexiLayout tree and select Properties... from the shortcut menu.

Show Properties dialog box, Character String tab

The text to find can be described in one of two ways.

Describing search text by means of a regular expression

A regular expression defines possible combinations of characters. If you use a regular expression, the hypothesis must meets its conditions. This method is usually used on good quality documents which are recognized without errors.

To enter a regular expression, select the Regular expression option and enter the expression in the field next to it. You can also click the button, which opens a drop-down list of options (Any Letter, Character From Set, etc.). Select the desired option to enter the corresponding regular expression into the field.

Regular expression alphabet

Describing search text by means of an alphabet

An alphabet lists characters that may occur in the search text. This methods is used whenever the character string cannot be described by means of a regular expression or there are too many errors in the recognized text as a result of poor image quality. You can specify several alphabets for a Character String element. If the format of the text is unknown, no alphabets are specified. In this case the program will consider all possible characters when looking for the object corresponding to the element.

To describe search text by means of an alphabet:

  1. Select a hypotheses generation mode. To use the characters in the search area to generate all the possible hypotheses, including intersecting and embedded hypotheses, select Allow embedded hypotheses. To generate hypotheses of maximum length, clear Allow embedded hypotheses.
  2. Create one or more alphabets.

    More about creating, editing, and deleting alphabets

  1. In the Percentage of non-alphabet characters field, specify the allowed percentage of characters which do not belong to any of the alphabets.

Depending on the method used to describe the search text, you may need to specify the following properties:

  1. Select Whole words only if you wish to find whole words only.
  2. Use the Detect words by interword space option to specify how lines should be divided into words. Disable this option to detect words automatically. Enabling this option will divide a line into words whenever the space between neighboring characters is greater than or equal to the value entered in Min interword space.
    Note.In the case of automatic word detection, word ends are detected based on spaces or other symbols that separate words (e.g. " , ", " ; ", " / ", " ? " - the exact set of symbols depends on the selected pre-recognition language), or based on other attributes. To make sure that the program correctly divides lines into words, review the text objects on the test images (View → Images → Objects → Recognized Words).
  3. In the Word count fields, specify the number of words in the character string. The number of words is specified by means of a fuzzy interval. The default interval is {-1,-1,INF,INF} (i.e. the program looks for hypotheses containing any number of words).
  4. In the Max space length field, specify the maximum length of the space inside the object. Measured in the user-defined units of measurement. You can estimates the length of the space by looking at the coordinates of the neighboring objects. Rest the mouse cursor on a neighboring objects to display its coordinates in the status bar. When looking for a text, characters will be added to the character string until the distance between neighboring elements exceeds Max space length.
  5. In the Character count field, specify the length of the character string (i.e. the number of characters in the string). The number of characters is specified by means of a fuzzy interval and assesses the quality of the hypothesis based on its length.
    Use the button to specify fuzzy intervals in a separate window that visualizes fuzzy intervals for your convenience.

See also:

Creating and deleting elements

An overview of element properties

Search area

Additional search constraints for Character String element

25.09.2020 9:24:45


Please leave your feedback about this article