Export formats

REST API methods return the result of image recognition in the formats:

Export format Value to pass to the request
JSON (text only) JsonTextOnly
JSON (preserve document structure) JsonPreserveDocumentStructure
PDF/A-3a (smaller file size) Pdf_A_3a_small
PDF/A-3a (maximum quality) Pdf_A_3a_max
PDF/A-3b (smaller file size) Pdf_A_3b_small
PDF/A-3b (maximum quality) Pdf_A_3b_max
PDF (image only) PdfImageOnly
TXT Txt
DOCX Editable DocxEditable
DOCX Exact DocxExact
XLSX Xlsx
PPTX Pptx
XML (text only) XmlTextOnly
XML (preserve document structure) XmlPreserveDocumentStructure
ALTOXML (text only) AltoXmlTextOnly
ALTOXML (preserve document structure) AltoXmlPreserveDocumentStructure
TIFF image Tiff
JPEG image Jpeg
JPEG 2000 image Jpeg2000
PNG image Png
HTML Html

Note: You can specify only one export format in the Format parameter of the process method.

Note: When exporting a multi-page document to a format that can only be represented in a single page (Jpeg, Jpeg2000, Png), an export-format file will be created for each page in the source document. All these files will be zipped into a zip archive. In this case, the application/zip value will be returned in the Content-Type header of the response with the processing results. For other export formats and for single-page Jpeg, Jpeg2000, Png files, the Content-Type header will return application/octet-stream value.

PDF file size. For PDF export, you can choose between "smaller file size" and "maximum quality." Smaller file size is achieved by using Mixed Raster Content (MRC) compression, which determines optimal compression rates separately for the text, the pictures, and the background.

Text only. The exported file will only contain recognized text without preserving the document layout. This mode is more efficient for structured documents such as invoices and receipts, and is focused on extracting text blocks.

Preserve document structure. The exported file will contain recognized text, with the document layout preserved as well. This mode is more efficient for documents without a predefined structure, such as contracts and agreements. It has been designed for a more precise detection of document structure.

DOCX Editable. The exported DOCX file preserves the original format and text flow but at the same time allows for easy editing. The output document may differ from the original image.

DOCX Exact. The exported DOCX file maintains the formatting of the original document. This may limit the changes that can be made to the text and formatting of the output document.

Available export formats for handwritten text recognition:

  • JSON (text only)
  • XML (text only)
  • TXT

Available languages for handwritten text recognition:

  • English
  • German
  • French
  • Spanish
  • Japanese

Note: Handwritten text recognition is available for Japanese (Modern) or the combination of English and Japanese languages, but not for Japanese alone.

19.02.2024 10:23:36

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.