PDF Conversion
The PDF format is often used in electronic archives for data storage purposes. It is the format of choice because of its versatility and possibility to keep both images and text.
Technologies developed by ABBYY allow recognized texts to be saved in PDF and PDF/A formats. One of the main goals of archiving is to achieve the smallest file size possible without losing in data quality.
A special compression technology called MRC (Mixed Raster Content) is used to minimize the size of PDF and PDF/A files.
PDF Input
Intelligent PDF processing |
ABBYY FineReader Engine analyses internal information within the source PDF files such as:
SDK enhances PDF conversion performance and speed by efficient and accurate text selection. If text is embedded into the PDF file, the OCR engine examines the integrity of the text layer, and makes a decision as to whether or not to extract the text or apply OCR on a block by block basis. |
Capture of internal PDF information | It extracts internal PDF links, hyperlinks and document properties such as: subject, author, title, and keywords. |
Note: The restrictions on the input PDF document will influence the importing and processing of the document. For example, if text copying is not allowed, the PDF document will not be processed. Please make sure the PDF files you are going to process are not protected against copying.
PDF Output
PDF security and encryption support |
ABBYY FineReader Engine 12 supports a variety of PDF security settings, increasing its applicability for government agencies and other organizations demanding high security.
|
Output in Tagged PDF format | Tagged PDF can be "reflowed" to fit different page or screen widths. Ideal for use with handheld devices (PDAs) or screen readers typically used by visually impaired users. |
Page size | Ability to set the size for all pages of an output file during PDF conversion. |
Metadata export | ABBYY FineReader Engine 12 enables metadata exporting (bookmarks, hyperlinks, cross-references, etc.). |
Conversion to PDF/A format |
Conversion to PDF/A format which is recommended as a standard for long-term preservation of page-oriented documents. ABBYY’s technologies allow saving documents to PDF/A formats of different compliance levels: PDF/A-1a, PDF/A-1b, PDF/A-2a, PDF/A-2b, PDF/A-2u, PDF/A-3a, PDF/A-3b, PDF/A-3u. The PDF/A-1a format has the following features: best retention of document formatting, logical structure, and ordinary appearance as well as the possibility of retaining the document appearance when using displays of different sizes (the document content is organized in a specific way to achieve this). The PDF/A-1b format is used to reproduce the document appearance only. The PDF/A-2a, PDF/A-2b and PDF/A-2u formats support the JPEG 2000 image compression, transparency and layers. The difference is that all text in PDF/A-2u has Unicode mapping. The PDF/A-3a, PDF/A-3b and PDF/A-3u formats support attaching the documents of any formats (such as Excel, Word, HTML, CAD, XML) to a PDF document. |
Conversion to PDF/UA format | ABBYY FineReader Engine 12 supports export to PDF in accordance with PDF/UA standard. The PDF/UA format supports tagged PDF and assistive technologies. |
CJK to PDF export | Enables conversion of documents in Chinese (both simplified and traditional), Japanese and Korean into PDF format. |
PDF (PDF/A) MRC compression
A special compression technology called MRC (Mixed Raster Content) is used to minimize the size of PDF and PDF/A files.
Document image files are usually very large due to the background, which often takes up to 90% of the file size. The background may, however, be unnecessary in the resulting document. It is the text and pictures that are important.
The MRC compression technology allows locating the color background and deleting it or compressing to a high degree. This leaves text and pictures against a white background contributing to smaller file size.
Picture objects (diagrams, graphs, logos, photos, drawings, stamps, signatures, etc.) are also slightly compressed, but only to an extent that doesn’t lower the quality.
The MRC technology analyzes the outlines of similar characters in the document, creates an average character template and uses it instead of a character itself. This leads to better readability, because some of the text defects are corrected, and the character outlines become more precise.
As a result, you get a smaller image which looks even better than before. The resulting document will have an unobtrusive bland background with fine text and pictures.
This "reconstruction" of the document can be useful when you have to deal with low quality images due to: bad lighting, out-of-focus photo, incorrect scanning/photo parameters, dark uncoated paper, or document dilapidation.
All this results in the image having a dark background with additional textures. The text appears blurred and difficult to read.
The MRC technology allows for better document appearance and up to 8-10 times smaller file size than JPEG.
Clear and simple PDF Conversion
ABBYY FineReader Engine provides developers with special tools to achieve the optimal PDF conversion mode appropriate for their particular needs.
PDF Export Scenario | Description |
---|---|
MaxQuality | Optimize the PDF (PDF/A) export in order to receive the best quality of the resulting file. |
Balanced | The PDF (PDF/A) export will be balanced between the quality of the resulting file, its size and the time of processing. |
MinSize | Optimize the PDF (PDF/A) export in order to receive the minimum size of the resulting file. |
MaxSpeed | Optimize the PDF (PDF/A) export in order to receive the highest speed of processing. |
See also
17.09.2024 15:14:40