Processing document sets
Document sets are processed differently from both individual documents and documents with multiple sections.
To process a document set, special completeness rules are used to check that no documents are missing from the set. Completeness rules may vary from simple document listings to complex rules stipulating that a set must contain certain documents if they are referred to by other documents or if they are included on the inventory list.
A document set goes through the following processing stages:
- Checking that the set contains all of the documents it should contain, checking the number of documents of each type, and, optionally, checking the order of the documents in the set.
- Capturing data from one main document in the set or capturing data from multiple documents and detecting any contradictions (e.g. to make sure that all the documents are related to the same person or organization).
- Visually checking documents for signatures and seals.
- Creating a searchable PDF from all the documents comprising the set.
- Exporting the captured data to a database, together with links to the original document images.
A document set may contain documents from which no data should be captured but whose images must still be included in the processing results. Such documents do not require optical recognition, but their type still needs to be detected to make sure that no documents are missing from the document set. Examples include hand-written applications, certificates, and receipts.
The process of document set recognition has a couple of distinct features:
It is not mandatory to list child documents. Instead, it suffices to specify only document sets to be recognized. To do this, go to batch type properties located in the Recognition tab. Sets that correspond to the specified definitions will be fully recognized.
If a child document is moved to the top level of a set, an assembly error occurs because the matched definition does not comply with the set structure. To avoid such errors it is necessary to add child document definitions to the general recognition list.