ABBYY FlexiCapture (or the System) extracts data from documents arriving in streams and that is why we measure performance in volumes processed per period of time.
To design the System, define the target performance using performance metrics.
The required processing time is defined by internal procedures, service-level agreements, and business processes requirements of a client company.
Processing volumes are based on previous data and business development trends, or a company’s business plan. Some occasional or seasonal leaps in volumes may occur because of successful advertising campaigns or the fiscal year-end etc.
These parameters are shaping system workload:
|Average batch size in pages|
|Image color mode: color, grayscale, black-and-white|
|Pages per day (i.e. 24 hours), average/peak|
|Pages per hour, average/peak|
|Average document size in pages|
|Number of scanning operators|
|Number of verification operators|
|Document storage time|
Average batch size
A batch stands for a set of related documents processed together.
E.g.: A customer submits a dozen of documents for processing – all under the same request, because crosschecks and business logic forbid their independent processing.
Image color mode
Document images come in all shapes and sizes as:
- scanned copies in color, grayscale or black-and-white;
- photos in different resolutions;
- email attachments – vector PDF files, etc.
Color grade of document images depends on:
- The ability to control and alter input data.
E.g.: If FlexiCapture clients are chosen for scanning, a company can set the same scanning mode (color grade) for all incoming documents.
- Long-term storage requirements.
E.g.: According to corporate regulations, all documents should be stored for 5 years as grayscale images only. In this case, FlexiCapture clients can convert color images to grayscale images at the scanning stage.
Although companies are often obliged to store incoming documents in their original formats, they are able to estimate what formats to expect – and provide some sample images. The big-costs scenario is when all document images are in color (network transmission and file storage costs).
Pages per day & pages per hour
The average and peak performance are defined as average and peak numbers of color, grayscale or black-and-white pages processed in a period of time that a company finds preferable (1 hour, 24 hours etc.).
- Specify precise time intervals: “24 hours” is better than “1 day,” which can be misinterpreted as 1 workday i.e. 8-12 hours only.
- Make them meaningful to you – and easily see if the system performs according to your needs and expectations.
E.g.: A better checking point for a customer is the “1000 pages in 24 hours” estimate, not “0.01 pages per second”.
We use pages instead of documents to estimate the processing volume, because documents vary significantly in size. At the same time, it is typically easy to guess the average size of documents of one type in pages. E.g. an invoice may contain 1 page or up to over 100 pages, but typically it has 3 pages on average.
Finally, we need to come up with figures in bytes and bits per second that are commonly used to calculate hardware performance. To do this, we use typical sizes of A4 page of different color modes:
- A4 black-and-white – 100 KB
- A4 grayscale – 3 MB
- A4 color – 10 MB
For a more precise estimate, a sample of actual documents is required.
Having typical sizes for a page of different color modes, and the average and peak numbers of pages per day or hour, you can estimate the average and peak input flow in bytes per second.
Number of users
Is actually a number of users accessing the system concurrently when document processing is in progress. There are 2 types of users:
- Scanning operators scan, check, and edit document images, then feed them to the Application Server.
- Verification operators verify and revise extracted data, downloading images from and sending corrected data to the Application Server.
Document storage time
Has a great impact on System configuration and hardware costs, because longer storage times require a larger FileStorage.
The document storage time within the System is an important parameter; it should not be confused with the document storage time inside the organization.
The average document storage time within the System is often the average processing time. Sometimes when a number of processing stages with manual operations are involved, this may be weeks.
However, there are cases, when the average document storage time within the System is actually their average processing time plus the time for storing images and data at the Processed stage. This happens because FlexiCapture treats a document as processed after its export to the company ERP system, even if its processing inside the organization is still in progress, which means this document may be re-sent to any of the initial processing stages within the System.
For this reason, documents with Processed status (i.e. document images and captured data) are stored inside FlexiCapture until:
- they have gone through all business processes; and
- are placed in the company archives.
Important! FlexiCapture is not an archiving system per se. A typical storage time for a document within the System is 2 weeks.
5/25/2023 7:55:03 AM