FRDocument Object (IFRDocument Interface)
This object corresponds to a processing document that may contain several pages. The FRDocument object is a root for a collection of document pages. Each page contains an open image and image layout.
The FRDocument object provides all necessary methods for document processing. You may process a document using only one method (Process method), or process it step by step performing analysis, recognition, synthesis, and export. Usually, all these steps are obligatory for general scenarios of document processing. See details in the descriptions of corresponding methods. After you have finished your work with the FRDocument object, release all the resources that were used by this object (use the Close method).
The object provides access to the different document attributes such as its author, keywords, subject, and title, via the DocumentContentInfo property.
The FRDocument object is a so-called "connectable object." To receive notification events during processing, you should create an object derived from the IFRDocumentEvents interface, then advise it to the FRDocument object by the call to the AdviseFREngineObject global function.
Properties
Name | Type | Description |
---|---|---|
Application | Engine, read-only | Returns the Engine object. |
Common attributes | ||
AllocatedSize | __int64, read-only | Returns the size of the memory allocated for the IFRDocument object. The value of this property is measured in bytes. |
Pages | FRPages, read-only | Returns the collection of pages of the document. |
PlainText | PlainText, read-only | Returns the text of the document in a special "plain text" format. |
Document languages | ||
BasicLanguage | BSTR, read-only |
Returns the main language of the recognized document. The property contains the internal name of the first language in the collection of detected languages (DetectedLanguages property). This property has a meaningful value only if recognition was performed with the automatic language detection on (see the IRecognizerParams::LanguageDetectionMode property for details); otherwise, it is an empty string. |
DetectedLanguages | DetectedLanguages, read-only |
Provides access to the collection of recognition languages detected in the recognized document. Languages in the collection are sorted by the frequency of occurrence: from the most frequently occurred to the least. This property has a meaningful value only if recognition was performed with the automatic language detection on (see the IRecognizerParams::LanguageDetectionMode property for details). The list of languages is updated only after recognition, i.e., if you edit the layout of the document manually, the collection remains the same. |
Document structure | ||
DocumentStructureOutOfDate | VARIANT_BOOL, read-only |
Specifies whether the logical structure of the document is out of date. If this property is TRUE, you should perform document synthesis before export. Otherwise, an error will occur during export. Note: It is worth noting that not only document structure, but also page structure can become invalid. Therefore before export you should also check that all pages in the document have valid page structure (see the IFRPage::PageStructureOutOfDate property). |
Business cards | ||
BusinessCards | BusinessCards, read-only | Provides access to the collection of business cards detected in the document. |
Attachments and metadata | ||
DocumentContentInfo | DocumentContentInfo | Contains information about the author, keywords, subject, and title of the document and stores the document information dictionary. |
PDFAttachments | PDFAttachments | Returns the collection of attachments of the document. They are extracted from the input PDF document during opening, or you can add your own files to be attached to the output PDF file during export. To attach all the files of this collection to the output PDF file, set the IPDFExportFeatures::WriteSourceAttachments property to TRUE. |
PDFFontNames | StringsCollection, read-only | Returns the collection of the names of fonts which have been extracted out of the input PDF file resources. If the document has been created by opening a file in other format, or from a PDF file containing no fonts, an empty collection will be returned. |
SourceHasDigitalSignature | VARIANT_BOOL, read-only | Indicates whether at least one of the source files was a digitally signed PDF. |
SourceHasTextualContent | VARIANT_BOOL, read-only | This property is deprecated and will be deleted in future versions. To find out if the file contains a text, use the CheckTextLayer method. |
Temporary files and flushing policy | ||
PageFlushingPolicy | PageFlushingPolicyEnum |
Specifies if the ImageDocument and the Layout objects for corresponding pages should be unloaded and saved to disk if there are no references to these objects. When this property value is set to PFP_KeepInMemory, the image documents and layouts for unused pages are not saved to disk. This property is PFP_Auto by default. |
TempDir | BSTR |
Specifies the path to the folder where the temporary image files in the ABBYY FineReader Engine internal format are stored. By default, the value of this property is "/tmp/ABBYY FineReader Engine 12". |
Methods
Name | Description |
---|---|
AddImageDocument | Adds one open image, represented by the ImageDocument object, to the document. |
AddImageFile | Opens an image file and adds the pages corresponding to the opened file to the document. |
AddImageFileFromAttachment | Opens an image file from the attachment and adds the pages corresponding to the opened file to the document. |
AddImageFileFromMemory | Opens an image file from the global memory, where it was previously loaded by the user, and adds the pages corresponding to the opened file to the document. |
AddImageFileFromStream | Opens an image file from the input stream implemented by the user, and adds the pages corresponding to the opened file to the document. |
AddImageFileWithPassword | Opens a password-protected image file and adds the pages corresponding to the opened file to the document. |
AddImageFileWithPasswordCallback | Opens an image file using the IImagePasswordCallback interface and adds the pages corresponding to the opened file to the document. |
AddPage | Adds a page to the document. |
Analyze | Performs layout analysis of all pages in the document. |
AnalyzePages | Performs layout analysis of specified pages in a document. |
CheckTextLayer | Checks the text layer on the specified document pages for its presence or reliability. |
Close | Releases all the resources that were used by the FRDocument object and returns the object into the initial state (as after its creation with the IEngine::CreateFRDocument method). |
ConvertFromOldVersion | Loads the contents of the FRDocument object, which were saved by the previous versions of ABBYY FineReader Engine, from the specified folder. |
Export | Saves the document into a file in an external format. |
ExportPages | Saves the specified pages into a file in an external format. |
ExportToMemory | Saves the document into memory in an external format. |
LoadFromFolder | Loads the contents of the FRDocument object, which were saved by ABBYY FineReader Engine 12, from the specified folder. |
Preprocess | Performs preprocessing of all pages in the document: corrects page orientation, inversion, geometrical distortions, performs page splitting if necessary. |
PreprocessPages | Performs preprocessing of the specified pages in the document: corrects page orientation, inversion, geometrical distortions, performs page splitting if necessary. |
Process | Performs preprocessing, layout analysis, recognition, and page and document synthesis of all pages in the document. |
ProcessPages | Performs preprocessing, layout analysis, recognition, and page and document synthesis of the specified pages in the document. |
Recognize | Performs recognition and page synthesis of all pages in the document. |
RecognizePages | Performs recognition and page synthesis of the specified pages in the document. |
SaveToFolder | Saves the contents of the FRDocument object to the specified folder. |
SplitPages | Splits each of the specified pages of the document into several pages. This method is useful if the page is a double-page spread of a book, or the page contains images of several business cards.The method provides information on how the pages have been split. |
Synthesize | Performs document synthesis of all pages in the document. |
SynthesizePages | Performs document synthesis of the specified pages in the document. |
Related objects
Output parameter
This object is the output parameter of the CreateFRDocument and CreateFRDocumentFromImage methods of the Engine object.
Input parameter
This object is passed as an input parameter to the following methods:
- CreateObjectFromDocument method of the ClassificationEngine object
- CompareDocuments method of the Comparator object
Samples
The object is used in almost all code samples (an exception is the BatchProcessing code sample).
See also
Parallel Processing with ABBYY FineReader Engine
03.07.2024 8:50:25