Export formats
REST API methods return the result of image recognition in the formats:
Export format | Value to pass to the request |
---|---|
JSON (text only) | JsonTextOnly |
JSON (preserve document structure) | JsonPreserveDocumentStructure |
PDF/A-3a (smaller file size) | Pdf_A_3a_small |
PDF/A-3a (maximum quality) | Pdf_A_3a_max |
PDF/A-3b (smaller file size) | Pdf_A_3b_small |
PDF/A-3b (maximum quality) | Pdf_A_3b_max |
PDF (image only) | PdfImageOnly |
TXT | Txt |
DOCX Editable | DocxEditable |
DOCX Exact | DocxExact |
XLSX | Xlsx |
PPTX | Pptx |
XML (text only) | XmlTextOnly |
XML (preserve document structure) | XmlPreserveDocumentStructure |
ALTOXML (text only) | AltoXmlTextOnly |
ALTOXML (preserve document structure) | AltoXmlPreserveDocumentStructure |
TIFF image | Tiff |
JPEG image | Jpeg |
JPEG 2000 image | Jpeg2000 |
PNG image | Png |
HTML | Html |
Note: You can specify only one export format in the Format parameter of the process method.
Note: When exporting a multi-page document to a format that can only be represented in a single page (Jpeg, Jpeg2000, Png), an export-format file will be created for each page in the source document. All these files will be zipped into a zip archive. In this case, the application/zip value will be returned in the Content-Type header of the response with the processing results. For other export formats and for single-page Jpeg, Jpeg2000, Png files, the Content-Type header will return application/octet-stream value.
PDF file size. For PDF export, you can choose between "smaller file size" and "maximum quality." Smaller file size is achieved by using Mixed Raster Content (MRC) compression, which determines optimal compression rates separately for the text, the pictures, and the background.
Text only. The exported file will only contain recognized text without preserving the document layout. This mode is more efficient for structured documents such as invoices and receipts, and is focused on extracting text blocks.
Preserve document structure. The exported file will contain recognized text, with the document layout preserved as well. This mode is more efficient for documents without a predefined structure, such as contracts and agreements. It has been designed for a more precise detection of document structure.
DOCX Editable. The exported DOCX file preserves the original format and text flow but at the same time allows for easy editing. The output document may differ from the original image.
DOCX Exact. The exported DOCX file maintains the formatting of the original document. This may limit the changes that can be made to the text and formatting of the output document.
Available export formats for handwritten text recognition:
- JSON (text only)
- XML (text only)
- TXT
Available languages for handwritten text recognition:
- English
- German
- French
- Spanish
- Japanese
Note: Handwritten text recognition is available for Japanese (Modern) or the combination of English and Japanese languages, but not for Japanese alone.
19.02.2024 10:23:36