Exporting Large Documents

Export to file is the last stage of documents processing. According to the statistics, the most popular format to export into is the PDF format. You can export one-page documents as well as documents that contain several hundred pages. Simultaneous export is not quite convenient in case you need to save large amounts of information. To catch and handle errors during simultaneous export is rather complicated, moreover, the processing speed can be low even when processing in parallel.

For exporting large documents to PDF, we recommend using the ExportFileWriter object, which extends the standard export functionality. Now the recognized data can be saved into PDF not only simultaneously but also in portions. Using the new methods AddPage and AddPages, you can set the portion size, which will ease the control of the export process.

The main advantages of the new export functionality are:

  • dramatic speed increase when exporting large documents
  • less RAM resources required
  • convenient error handling without losing all the exported data. In the case of simultaneous export, a single error may cause the failure of the whole export stage. If you are processing a large number of pages, the restarting of the export can take plenty of time. Now using the new functionality, you do not have to stop processing. The errors can be caught and handled inside small portions. So export will be finished faster even if some errors occur during the processing.

Recommendations for best processing speed

  • Using the new export mode is reasonable for documents that contain 50 pages or more. For 500 pages and more, using ExportFileWriter is highly recommended.
  • For the best possible speed, choose Batch Processor (see Processing using Batch Processor).
  • Export a fixed number of pages at a time. You will need to do some experimenting to pick the best number of pages for your documents. During internal ABBYY tests, the portion size of 30 pages was found best for generic documents.

Speed testing results

The results of speed testing are presented on the diagram.

It can be clearly seen that the processing using the new export functionality works about 4 times faster on the large documents than the processing with standard export.

The processor of the testing machine is Intel® Core™ i5-3450 (3.10 GHz, 4 physical cores), 8 GB of RAM, the number of simultaneously run processes is 4. During session export, the documents were saved 30 pages at a time.

Session export using Batch Processor

See a sample implementation of session export below.

C++ code

C# code

See also

BatchProcessor

MultiProcessingParams

Parallel Processing with ABBYY FineReader Engine

9/17/2024 3:14:40 PM

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.