Chinese Simplified (简体中文)

Key New Features

Comparing Documents

New "Compare Documents" Module For quick verification of the document’s integrity, the new "Compare Documents" Module in ABBYY FineReader Engine enables detecting content differences in two versions of the same document.
Comparison of bilingual documents The new option of the "Compare Documents" Module provides the ability to automatically detect the bilingual nature of such a document and its complex layout and to compare each column (and thus each language version) separately.

Input of Office formats

Processing of Office documents

In addition to a broad set of image formats, FineReader Engine can now process input documents that are created in one of Office document formats:

  • Text documents: .doc, .docx, .rtf, .htm / .html, .txt, .odt
  • Tables: .xls, .xlsx, .ods
  • Presentations: .ppt, .pptx, .odp
Opening Office documents from memory The new method for opening Microsoft Office and Apache OpenOffice files directly from memory allows increasing the speed of the document import step, which accelerates the overall document processing speed.

MRZ Capture

Data capture from a Machine-Readable Zone (MRZ) The new feature allows automatic data extraction from a machine-readable zone (MRZ) in ID documents and allows faster entering and verification of personal data during customer onboarding or verification processes.

Improved Japanese OCR

Leading recognition accuracy With the new version of ABBYY Fine Reader Engine, Japanese OCR has seen some major improvements, bringing recognition accuracy to a new level previously unattainable for most solutions.

Improved Arabic OCR

End-to-end recognition for Arabic on poor images Arabic OCR on low-quality images where general technology provides low confident results with a lot of errors.

Improved Korean OCR

Deep learning language model for Korean A trained model for Korean language selects the best word recognition variant from recognition hypotheses or even generates new one based on a recognition context (preceding and following words).

New neural network-based OCR technologies

Improvements in OCR technologies

To implement the neural network approaches in OCR technologies, ABBYY FineReader Engine was enhanced by the new features of processing the handprinted and Latin symbols:

  • Language model for consistent and accurate choice of word variants
  • End-to-end recognition for Latin scripts to process the multilingual documents
Machine learning barcode recognition technology The neural network architecture introduces a new model of barcode recognition performing detection of the approximate region of a barcode, its classification, and obtaining the output represented as a region with the most likely type of barcode.
New recognition mode The new Accurate mode allows you to get the maximum quality of the output document, assuming a reasonable slowdown in the recognition speed. This mode is best suited for low-quality or photo-generated invoices, contracts, receipts, and ID cards.

OCR quality improvements for text near stamps and signatures

Detecting text near stamps and signatures Whenever an agreement contains stamps or signatures, the text nearby is recognized separately from them, thus improving the quality of the processed documents.

New licensing options

Online License usage as Network and Standalone The Developer’s Help of the FineReader Engine 12 has been extended by additional information about different possibilities to license the SDK, describing the individual types of licensing options in an easy-to-understand comparison table.
Using grace periods With the new option, customers can use the ABBYY FineReader Engine license for some time after the expiration date, thereby enlarging the license validity period.

ICR and OMR technologies in Linux version

Handprinted text and checkmark recognition With ABBYY FineReader Engine 12, you may recognize the handprinted characters and the checkmarks of various types. ICR and OMR technologies are implemented to extract the data from the handwritten documents and develop new data extraction solutions.

Ability to run Engine in cloud environments

New deployment options New licensing type allows deployment in Virtual and Cloud environments, allowing you to offer a broader spectrum of solutions. The licensing mechanism requires internet connection and supports proxy servers.

New libraries in ABBYY FineReader Engine

NeoML library usage NeoML is an open-source end-to-end machine learning framework that allows you to build, train, and deploy Machine Learning models. This framework is used by engineers for computer vision and natural language processing tasks, including image preprocessing, classification, document layout analysis, OCR, and data extraction from structured and unstructured documents.
Embedded PDFium for processing PDFs PDFium is a cross-platform native library conforming to PDF standards and controlling all operations related to PDF, including processing, parsing, rendering, and obtaining the output.

Enhanced Document Classification

Document Classification using NLP and Machine Learning With ABBYY FineReader Engine 12, incoming documents can be automatically sorted into different categories. Machine learning, OCR and natural language processing technologies are employed to train the image-based and text-based classifiers on representative documents. The received information is then used during classification step.
Text-based classifier: advanced security of training data To train and optimize the text-based classifier, documents representing each document category must be imported. In order to protect data contained in these documents, implemented hashing algorithms avoid the possibility to recover information from the sample documents.
Enhanced Classification Demo Sample ABBYY FineReader Engine is able to process PDFs, scanned or photographed document images as well as documents in Office formats. To reflect this capability in the classification process, the provided pre-compiled Demo Sample for classification was enhanced and allows now to import Office documents in addition to PDFs and image formats.

Code sample for command-line interface (CLI)

Ready-to-use code sample With this code sample, developers can efficiently utilize ABBYY FineReader Engine libraries and integrate document processing capabilities in command-line-based applications.

Implementation of PDF meta-data extractor

Digitally-born PDF file processing AuxInfo is a supplementary object of PDFium providing meta-data information from a PDF file. ABBYY R&D PDFTools team implemented its own AuxInfo object working with PDFium.

Improved PDF processing

Improvements for PDF with "mixed"
contents

ABBYY FineReader Engine provides new capabilities for processing the PDF documents containing both image-only and digitally-born pages:

  • Adaptive recognition to improve and speed up PDF processing
  • Text layer quality classifier for preserve good one in the output format
  • Indication of digital signature presence in PDF
  • New content reuse mode for processing the document with mixed contents
Using additional content in PDF

To ensure more flexible forming the PDF contents, ABBYY FineReader Engine offers the new options:

  • Opening PDF Portfolios and processing their contents
  • Adding custom images to the output PDF and managing their positions

Additional language support

Farsi OCR ABBYY FineReader Engine features updated and improved Farsi recognition options, opening up the door for more effective work with documents from Iran, Afghanistan and many other countries of the Middle East.
Georgian OCR The Georgian language was added as new OCR language.
OCR for simple mathematical formulas Extracting characters of simple mathematical formulas allows better recognition of scientific documents containing simple single-line mathematical formulas inside the text.
Technical preview for Burmese OCR Burmese OCR was added as a technical preview to highlight future capabilities.
Technical preview for Bangla OCR Bangla OCR was added for a technical preview to demonstrate potential functionality.

Improved document layout recreation

Improved table reconstruction With ABBYY FineReader Engine 12, extracted tables from documents keep their formatting better than ever.
Detection and recreation of balanced columns Whenever a document contains balanced columns of text (e.g., contracts, scientific papers, articles, etc.), now the initial structure stays intact, thus simplifying document processing.
New "single-column" document model The main improvements of the new algorithm are in the detection and analysis of tables and charts.
Enhanced table structure analysis With the improved mechanism of document conversion, ABBYY FineReader Engine can detect tables with columns of numbers in the "Accounting" format.

Internal process optimization for faster processing

New scheme of the ILayout object iteration A new scheme that speeds up the iteration of the ILayout object obtained after processing the document outside the main process.

Online documentation

Documentation available online In addition to the built-in documentation, you can now use the online version providing "just in time" information about the features and capabilities of ABBYY FineReader Engine.

New export formats

New ALTO versions support ALTO (Analyzed Layout and Text Object) is an XML Schema that details technical metadata to describe the layout and content of physical text resources, such as the pages of a book or newspaper. The latest versions of this schema (4.0, 4.1, 4.2) are supported in FineReader Engine 12.
PDF/A-2b and PDF/A-3b support PDF/A is an ISO-standardized version of the Portable Document Format (PDF), specialized for use in archiving and the long-term preservation of electronic documents. Now, FineReader Engine supports all PDF/A conformance levels.

Full functionality

03.07.2024 8:50:25

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.