How to Set a Number of Pages to Recognize
Sometimes recognizing entire documents isn't necessary: recognizing the first several pages is enough for indexing the document and adding it to the database.
Recognizing documents only partially can significantly reduce processing time and save pages in your license. A user can check if all the required data has been recognized during the verification stage and select additional pages for recognition if this is not the case. The number of pages available in your license will only decrement by the number of pages that were recognized.
Important! You can only select specific pages for recognition if the following conditions are met:
- Documents in the same job are processed and separated individually (the Create one document for each file in job option on the 3. Document Separation tab of the Workflow Properties dialog must be enabled).
- Data is only exported to text formats such as TXT and HTML. PDF is not treated as a text format.
If other options are selected, all pages in documents will be recognized and notifications about this fact will appear in the Job Log (the notifications will contain the following message: Process first pages setting is not compatible with any document separation method except "Create one document for each file in job").
To set up partial recognition of documents using an XML ticket, complete the following steps.
- Create an XML ticket that contains the following information:
- Specify how many pages at the beginning of the document you want to process in the PageNumToRecognizeForSingleInputFile attribute of the <XmlTicket> element. Keep in mind that documents may start with a title page and a table of contents, so the first few pages of a document sometimes do not contain any useful information.
- Specify the name of the file you want to recognize in the Name attribute of the <InputFile> element. If you want to partially process two or more documents, create a list that contains their names.
Example of an XML ticket:
<InputFile Name="50.pdf" />
<InputFile Name="100.tif" />
This XML ticket tells ABBYY FineReader Server to recognize the first three pages in each file.
- Place the XML ticket in the Input folder used in the current workflow.
- Place image files in the Input folder used in the current workflow. If the workflow is running, the program will begin recognizing the images automatically.