Detecting multi-page tables
What is a multi-page table?
ABBYY FlexiLayout Studio can detect multi-page tables, i.e. tables that stretch over several pages of a document, the order and number of their columns being the same on all the pages. The width and the exact location of the columns may vary slightly from page to page (the maximum variation allowed by the program is 1 inch).
For the sake of brevity, we use the term one-page subtable for each part of a multi-page table located on one page.
The header and footer of a multi-page table
Multi-page tables may have a header that either repeats on every page or occurs only once on the first page. The option Header is on each page tells the program whether the header must be detected on every page. If the option is disabled, ABBYY FlexiLayout Studio will search for the header only on the first one-page subtable.
The same applies to the footer of a table, which can occur either only once on the last page or at the bottom of every page. The Footer is on each page tells the program whether the footer must be detected on every page. If the option is disabled, ABBYY FlexiLayout Studio will search for the footer only on the last one-page subtable.
Searching for a multi-page table
The program starts searching for a multi-page table from the first page of the search area specified by the user and continues the search on the subsequent pages. The search stops when any of the following conditions is met:
- If the footer must occur only on the last page and it has been detected.
- If the program has come to the end of the search area specified for the table.
- If the program has not detected any one-page subtables on a page, i.e. it has not found the header, the footer, or the table body.
Using the region of another element to detect the name of a column in a multi-page table
In the case of multi-page tables, you can still use two methods to search for the header and footer of a table or its subtables: either by specifying keywords (the Detect by keyword option) or by turning an already detected element into header or footer (the Use found element as... option). If you use the first method, the program will search for the specified keywords on those pages where the header or footer may occur (defined by the Header/Footer occurs on every page option). If keywords cannot reliably detect the column names on the images being processed, use the Use found element as... option.
If the header and footer of a multi-page table occur only on the first and the last page respectively, you can use already detected simple elements, just like for one-page tables. This method is used in Sample 3 (see the project in %public%\ABBYY\FlexiCapture\12.0\Samples\FLS\English\Invoice). However, if the header or the footer must be detected on every page, the best strategy is to use the subelements of an already detected Repeating Group. Using a Repeating Group, you can find the header on every page by describing it only once in the Repeating Group and specifying the possible number of instances for the Repeating Group. If the Table element is described below the Repeating Group in the tree of elements, use all the instances (AllInstances) of the desired subelements to reference column names to the Repeating Group. For an illustration of this approach, see the sample project in %public%\ABBYY\FlexiCapture\12.0\Samples\FLS\English\Invoice with Repeatable Groups).
Note.If there are several instances of the selected Repeating Group on a page, the first instances will be used as the name of the column.
Searching for repeating tables
In some documents, there may be several identical tables on a page (the tables arrangement and order of columns in these tables may be the same). Sometimes, the last table in the series may be interrupted and continue on the following page. In between the tables there may occur textual data or visual formatting elements, e.g. table captions or explanatory notes.
You can detect tables like these by placing the Table element that describes one instance of the table inside a Repeating Group. This allows you to describe the entire set of tables in one Table element and specify the repetitions of the instances in the properties of the Repeating Group that encompasses the Table element.
If you are using the Use found element as... option to detect column names bases on an already detected element, it is more convenient to place this element into the same Repeating Group above the Table element. In this case, when you select an auxiliary element that will be used to detect the column name, reference the current instance (CurrentInstance) of the auxiliary element within the Repeating Group. Now when the program will search for each instance of the column name, it will use its corresponding instance of the auxiliary element. For an illustration of this approach, see the sample project in %public%\ABBYY\FlexiCapture\12.0\Samples\FLS\English\Prices).
12.04.2024 18:16:02