How to Calculate the Quota of Pages Allowed by the License
ABBYY FineReader Server 14 has a new workflow type called "audit workflow." An audit workflow analyzes files in a specified storage location and provides statistics that can be used to calculate the number of pages that may be deducted from your license.
An audit workflow analyzes the formats of input files and classifies all input files into the following groups:
- files that need to be recognized (images and PDFs without a text layer)
- files that need to be converted (office format documents, e-mail messages, etc.)
- files that may require conversion
- files in unsupported formats
For each file, the program will calculate the required number of pages.
Note. In ABBYY FineReader Server 14, Fast analysis of document metadata is enabled by default. Page count information obtained using this analysis type will only be an approximation:
- For some files, the number of pages they contain cannot be calculated.
- During processing, the number of pages may increase for input files that contain pages larger than A4.
- A PDF file with a good text layer that contains pages with no text may require re-recognition.
See also: Rules for deducting pages for jobs.
In order to get more a more precise page count, navigate to the Settings tab in the Audit workflow settings dialog box and select Thorough analysis. This analysis type is available in FineReader Server 14 R3 Update 2 and later versions.
Note. Thorough analysis significantly slows down document processing.
To create a new audit workflow, select the Workflows node in the tree on the left and click the (Create New Audit Workflow) button on the toolbar. You can also copy an existing workflow and change the settings as needed. To do this, select Duplicate on the shortcut menu of your workflow. In order to view or edit the workflow properties, select the appropriate node and click the (Workflow Properties) button on the toolbar. Alternatively, you can select Properties on the shortcut menu.
You can choose a file storage location in the Audit workflow settings dialog box. You can specify a shared folder on a local or network drive, an FTP/SFTP server folder, or a SharePoint library. The audit workflow is deemed completed when all files in the storage have been analyzed. To analyze new files, the audit workflow should be launched again. The audit workflow does not modify the contents of the file storage.
In the audit workflow settings, you can specify additional reports to be created for specific file categories:
- Files that are over the specified size limit (in megabytes).
Processing large files is not always desirable, as it can slow down your system and delay other tasks. This report can be used to decide which large files to process separately.
- Files that were last modified before the specified date.
This report lets you determine the percentage of outdated files whose processing can be delayed.
- Duplicate files.
This report contains a list of all duplicate files found in the storage, as well as their sizes and locations. This will let you optimize your storage and process only the files that you actually require processing.
Note. Enabling duplicate search can significantly slow down the audit workflow.
Note. If there are custom columns in your SharePoint library, duplicate search will not work for DOC, DOCX, XLS, XLSX, PPT, and PPTX documents, as Microsoft SharePoint will modify this types of files by adding custom properties.
- Regular expression search.
This report will be created for files that contain text matching a regular expression.
Note. Starting from FineReader Server 14 Release 2, page processing statistics for all workflows are displayed in the details pane. The total number of processed pages is also displayed in the FineReader Server node.