Sample 3. Step 13: Further analysis of the images
So far we have described the Invoice Number, Invoice Date, and DeliveryAddress fields.
Let us now decide in which order the other elements must be detected.
Analysis of the arrangement of the fields on the test images reveals that to detect the Country field, we can use the name of this field and an element of type Character String. But in order to detect the Total Quantity and Total Amount fields we will need to use not only the name, which is shared by the two fields, but also additional elements. The names of the corresponding columns of the table may be used for the purpose.
We also need to prepare the ground for the search of the Invoice Table field, which is to be detected by means of an element of type Table.
- We describe the Header of the table. There are two ways to detect column names.
- The first one is to specify one or several keywords in the Column Properties dialog box (the Properties dialog box of the Table element → Columns tab). This method is quick but not very flexible.
- The second way is to use a previously detected element as a reference element to find the location of the column name. This method allows you to take full advantage of the additional settings available for elements. Analysis of the pre-recognition results reveals that the recognition quality is good enough, so we can use the first method (the only snag would be the very short name of the Quantity column name — 20% of errors in a three-letter keyword “Qty” effectively means that no errors allowed). We will use the first method to detect the first two columns, Reference and Designation, and we can use the previously described elements to detect the names of the rest of the columns: Quantity, Unit Price, and Total (as far as we need to create these elements anyway to detect the TotalQuantity and TotalAmount fields).
- We describe the Footer of the table. It can also be described either by using keywords or by using an auxiliary element. Analysis of the keywords in the footer reveals that on some of the images the keywords also occur in the first row of the table, so we have to use an auxiliary element because of means to restrict the search area.
- We describe the table search area. After we have described the header and the footer, we need to describe the right boundary of the table (we do not need to describe the left boundary, as there are no other data in this part of the image). Since we cannot use the context to separate the figures in the Total column from the figures in the Sales column, we will need a different approach. We can use the name of the last column, Sales, to restrict the table search area on the right.
Note.Elements and the order in which they must be detected are selected by trial and error and can be changed during FlexiLayout adjustment.
Prior to describing the remaining fields, we will describe the auxiliary elements.
- Analysis of interdependencies among the elements reveals that first we have to detect the names of the Quantity, Unit Price, Total, and Sales columns.
- The header of the table begins with a horizontal separator between the column names and the Invoice Number, Invoice Date, and Delivery Address fields. This separator will help us to restrict the search areas of the column names.
- For this purpose, create an element of type Separator and name it hsTableHeaderTop (for detailed instructions, see Step 14).
We will look for the column names in their natural order. - Create an element of type Group and name it TableHeader (for detailed instructions, see Step 15). This element must include:
- Element kwQuantity of type Static Text, which will correspond to the name of the Quantity column of the table InvoiceTable (for detailed instructions, see Step 16);
- Element kwUnitPrice of type Static Text, which will correspond to the name of the Unit Price column of the table InvoiceTable (for detailed instructions, see Step 17);
- Element kwTotal of type Static Text, which will correspond to the name of the Total column of the table InvoiceTable (for detailed instructions, see Step 18);
- Element kwSales of type Static Text, which will correspond to the name of the Sales column of the table InvoiceTable (for detailed instructions, see Step 19).
- Let’s continue to describe the lower part of the document. The fields we are interested in are the name of the Total Quantity and the Total Amount fields (which serves as the bottom boundary of the table) and any other captions that can help locate the bottom boundary of the table. We will also describe the name element and the source element for the Country field in this logic group.
- Create a Group element and name it Footer. This element must include:
- Element kwFooter of type Static Text, which will correspond to the footer of the Invoice Table (for detailed instructions, see Step 21);
- Element kwTotal of type Static Text, which will correspond to the name of the Total Quantity and the Total Amount fields (for detailed instructions, see Step 22);
- Element kwOrigin of type Static Text, which will correspond to the name of the Country field (for detailed instructions, see Step 23);
- Element Country of type Character String, which will correspond to the Country field (for detailed instructions, see Step 24).
Now that all the preparations have been made and the order of creating additional elements has bee described, we can start looking for the remaining fields. Relying on the previously detected names of columns and Footer, we will describe the remaining Total Quantity, Total Amount, and Invoice Table fields.
- For this purpose, create two elements of type Character String and name them TotalQuantity and TotalAmount. These elements will correspond to the Total Quantity and the Total Amount fields respectively (for detailed instructions, see Step 25).
- Create an element of type Table and name it InvoiceTable. This element describes the Invoice Table field (for detailed instructions, see Step 26).
12.04.2024 18:16:02