FRDocument Object (IFRDocument Interface)

This object corresponds to a processing document that may contain several pages. The FRDocument object is a root for a collection of document pages. Each page contains an open image and image layout.

The FRDocument object provides all necessary methods for document processing. You may process a document using only one method (Process method), or process it step by step performing analysis, recognition, synthesis, and export. Usually, all these steps are obligatory for general scenarios of document processing. See details in the descriptions of corresponding methods. After you have finished your work with the FRDocument object, release all the resources that were used by this object (use the Close method).

The object provides access to the different document attributes such as its author, keywords, subject, and title, via the DocumentContentInfo property.

The FRDocument object is a so-called "connectable object." To receive notification events during processing, you should create an object derived from the IFRDocumentEvents interface, then advise it to the FRDocument object by the call to the AdviseFREngineObject global function.

Properties

Name Type Description
Application Engine, read-only Returns the Engine object.
Common attributes
AllocatedSize __int64, read-only Returns the size of the memory allocated for the IFRDocument object. The value of this property is measured in bytes.
Pages FRPages, read-only Returns the collection of pages of the document.
PlainText PlainText, read-only Returns the text of the document in a special "plain text" format.
Document languages
BasicLanguage BSTR, read-only

Returns the main language of the recognized document. The property contains the internal name of the first language in the collection of detected languages (DetectedLanguages property).

This property has a meaningful value only if recognition was performed with the automatic language detection on (see the IRecognizerParams::LanguageDetectionMode property for details); otherwise, it is an empty string.

DetectedLanguages DetectedLanguages, read-only

Provides access to the collection of recognition languages detected in the recognized document. Languages in the collection are sorted by the frequency of occurrence: from the most frequently occurred to the least.

This property has a meaningful value only if recognition was performed with the automatic language detection on (see the IRecognizerParams::LanguageDetectionMode property for details).

The list of languages is updated only after recognition, i.e., if you edit the layout of the document manually, the collection remains the same.

Document structure
DocumentStructureOutOfDate VARIANT_BOOL, read-only

Specifies whether the logical structure of the document is out of date. If this property is TRUE, you should perform document synthesis before export. Otherwise, an error will occur during export.

Note: It is worth noting that not only document structure, but also page structure can become invalid. Therefore before export you should also check that all pages in the document have valid page structure (see the IFRPage::PageStructureOutOfDate property).

Business cards
BusinessCards BusinessCards, read-only Provides access to the collection of business cards detected in the document.
Attachments and metadata
DocumentContentInfo DocumentContentInfo Contains information about the author, keywords, subject, and title of the document and stores the document information dictionary.
PDFAttachments PDFAttachments Returns the collection of attachments of the document. They are extracted from the input PDF document during opening, or you can add your own files to be attached to the output PDF file during export. To attach all the files of this collection to the output PDF file, set the IPDFExportFeatures::WriteSourceAttachments property to TRUE.
PDFFontNames StringsCollection, read-only Returns the collection of the names of fonts which have been extracted out of the input PDF file resources. If the document has been created by opening a file in other format, or from a PDF file containing no fonts, an empty collection will be returned.
SourceHasDigitalSignature VARIANT_BOOL, read-only Indicates whether at least one of the source files was a digitally signed PDF.
SourceHasTextualContent VARIANT_BOOL, read-only This property is deprecated and will be deleted in future versions. To find out if the file contains a text, use the CheckTextLayer method.
Temporary files and flushing policy
PageFlushingPolicy PageFlushingPolicyEnum

Specifies if the ImageDocument and the Layout objects for corresponding pages should be unloaded and saved to disk if there are no references to these objects.

When this property value is set to PFP_KeepInMemory, the image documents and layouts for unused pages are not saved to disk.

This property is PFP_Auto by default.

TempDir BSTR

Specifies the path to the folder where the temporary image files in the ABBYY FineReader Engine internal format are stored.

By default, the value of this property is "/tmp/ABBYY FineReader Engine 12".

Methods

Name Description
AddImageDocument Adds one open image, represented by the ImageDocument object, to the document.
AddImageFile Opens an image file and adds the pages corresponding to the opened file to the document.
AddImageFileFromAttachment Opens an image file from the attachment and adds the pages corresponding to the opened file to the document.
AddImageFileFromMemory Opens an image file from the global memory, where it was previously loaded by the user, and adds the pages corresponding to the opened file to the document.
AddImageFileFromStream Opens an image file from the input stream implemented by the user, and adds the pages corresponding to the opened file to the document.
AddImageFileWithPassword Opens a password-protected image file and adds the pages corresponding to the opened file to the document.
AddImageFileWithPasswordCallback Opens an image file using the IImagePasswordCallback interface and adds the pages corresponding to the opened file to the document.
AddPage Adds a page to the document.
Analyze Performs layout analysis of all pages in the document.
AnalyzePages Performs layout analysis of specified pages in a document.
CheckTextLayer Checks the text layer on the specified document pages for its presence or reliability.
Close Releases all the resources that were used by the FRDocument object and returns the object into the initial state (as after its creation with the IEngine::CreateFRDocument method).
ConvertFromOldVersion Loads the contents of the FRDocument object, which were saved by the previous versions of ABBYY FineReader Engine, from the specified folder.
Export Saves the document into a file in an external format.
ExportPages Saves the specified pages into a file in an external format.
ExportToMemory Saves the document into memory in an external format.
LoadFromFolder Loads the contents of the FRDocument object, which were saved by ABBYY FineReader Engine 12, from the specified folder.
Preprocess Performs preprocessing of all pages in the document: corrects page orientation, inversion, geometrical distortions, performs page splitting if necessary.
PreprocessPages Performs preprocessing of the specified pages in the document: corrects page orientation, inversion, geometrical distortions, performs page splitting if necessary.
Process Performs preprocessing, layout analysis, recognition, and page and document synthesis of all pages in the document.
ProcessPages Performs preprocessing, layout analysis, recognition, and page and document synthesis of the specified pages in the document.
Recognize Performs recognition and page synthesis of all pages in the document.
RecognizePages Performs recognition and page synthesis of the specified pages in the document.
SaveToFolder Saves the contents of the FRDocument object to the specified folder.
SplitPages Splits each of the specified pages of the document into several pages. This method is useful if the page is a double-page spread of a book, or the page contains images of several business cards.The method provides information on how the pages have been split.
Synthesize Performs document synthesis of all pages in the document.
SynthesizePages Performs document synthesis of the specified pages in the document.

Related objects

Object Diagram

Output parameter

This object is the output parameter of the CreateFRDocument and CreateFRDocumentFromImage methods of the Engine object.

Input parameter

This object is passed as an input parameter to the following methods:

Samples

The object is used in almost all code samples (an exception is the BatchProcessing code sample).

See also

FRPage

IFRDocumentEvents

Parallel Processing with ABBYY FineReader Engine

Working with Connectable Objects

Working with Properties

7/3/2024 8:50:25 AM

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.