A set of one or more page images and data extracted from them.
A Document Definition defines the principles to be used for identifying and processing a particular type of document. A Document Definition defines:
- The document structure, i.e. the allowed order of pages in documents of this type (this information will be used for the correct assembly of pages into documents)
- Document sections
- Rules that field data should satisfy
- The locations of the fields and their captions on the data form
- Document export settings
- Document processing settings
Documents that have certain characteristics in common and thus are handled in a uniform manner within a business process. Examples of document types include invoices, contracts, and passports.
A field or a group of fields containing information that needs to be extracted by means of NLP technology. Examples of entities include: people, companies, places, amounts, and dates.
A document element intended for data extraction. Fields may be simple or complex. An example of a complex field is a field of type "Table," where each cell can be regarded as a separate child field.
NER (Named Entity Recognition)
An information extraction task that seeks to locate and classify named entity mentions in unstructured text.
NLP (Natural Language Processing)
A subfield of artificial intelligence and computational linguistics that studies computer analysis and synthesis of natural languages. One application of NLP is information extraction. Other uses of NLP include machine translation, chatbots, document classification, and sentiment analysis.
A mechanism that determines what entities and segments should be extracted form texts and how. The subject area and the extraction algorithm are selected when training an NLP model.
A text fragment consisting of one or more paragraphs that contains data that needs to be extracted. A segment can also be a field that needs to be extracted (for example, conditions for terminating an agreement).
The process of identifying segments. Segmentation precedes information extraction and is useful in the case of large documents, as it narrows down a search for entities to specific text fragments.