Using Exclude to exclude elements
When creating a FlexiLayout, the search areas cannot always be defined in terms of "top-bottom-left of-right of". This is particularly true when the search area contains unwanted objects, e.g. objects that may have been introduced during scanning. These objects should not make it into the recognition block, so they have to be excluded when creating the corresponding element. The field of the address in the figures below is a good example. What is meant here is the mutual location of the field names "Address:", "(Address)" (which must be excluded from recognition) and the address itself (which should be recognized).
The Exclude function excludes a region from the search area of an element. The region to exclude is specified when calling the function. The excluded region may be either the region of a hypothesis (detected or not), a rectangle or an array of rectangles of the found elements, or a region built from the regions of objects which are part of a found hypothesis (see ExcludeSet method).
Let us study the Exclude.fsp project (folder %public%\ABBYY\FlexiCapture\12.0\Samples\FLS\Tips and Tricks\Exclude) to see how the Exclude function works.
The project has 4 pages:
- Page 1 - the address field is written in several lines. The name "Address:" is in the first line, so we cannot say that the sought data field is to the right of or below the field name. However, the name still has to be excluded from recognition;
- Page 2 – the address field has no name;
- Pages 3 and 4 – the line "(Address)" lies within the data field. However, it has to be excluded from recognition.
On all the pages the address field is below the field with the company name.
First we create an element of type Static Text, name it CompanyHeader and specify its value. This element will be used to detect the name of the field.
Next we create an element of type Static Text and name it AddressHeader. This element will be used to detect the name of the field. We specify the following static text values: Address:|(Address), i.e. we list all the variants of this line that should be detected and then excluded from recognition in the address field. We describe the search constraints for these lines in the Relations field and specifying that these lines should be looked for below (Below) the CompanyHeader element.
Note.The element for the name must be created before the element for the address field, because the Exclude function only allows referring to elements that are higher up in the FlexiLayout tree than the current element.
Launch the FlexiLayout matching procedure (Match FlexiLayout) to make sure that the name of the address is found on all the pages where it occurs.
Since the address field in this example has multiple lines, we create an element of type Paragraph to search for it. We name it Address and, as in the previous example, specify that it should be looked for below (Below) the CompanyHeader element.
Once you have run the FlexiLayout matching procedure, you will see that on pages 1,3 and 4 the line of the name has also been included into the address field.
Now let us go back to the Properties dialog box of the Address element and click the Search Constraints tab. Click Add... next to the Exclude regions of elements section and choose the AddressHeader element from the list.
After saving the properties of the element, run again the FlexiLayout matching procedure. You will see that the region of the block which corresponds to the address field acquired a non-rectangular shape, because the region of the name has been cut from it.
Note.The Exclude regions of elements section allows you to exclude from the search area of the current element the region of a found hypotheses for another element. To exclude a more geometrically complex area, or to detect a region first and then exclude it from the recognition process, then you should use the functions Exclude, ExcludeRect and ExcludeSet in the Advanced pre-search relations field.
Note.The region of a block by definition consists of rectangles placed one upon another, so any horizontal line can cross the region only once, i.e. there can be no vertical "teeth" jutting out. Therefore, if the line to be excluded and the required information are located on the same or roughly on the same level, then the name cannot be excluded at the stage of FlexiLayout creation. In that case the name can be excluded after recognition in FlexiCapture by using the "Substitute from the list" rule.
12.04.2024 18:16:02