XML Result
An XML result file contains information about the parameters and results of job execution. An XML result file is created for both successful and failed jobs.
On the 6. Output tab of the Workflow Properties dialog box, you can specify a folder where the XML result file should be published.
Important! The name of the XML result file must not contain more than 64 characters.
If some errors occur when executing a job and the job is marked as Failed, the XML result file and unprocessed image files will be saved as specified under Save failed jobs to on the 4. Quality Control tab of the Workflow Properties dialog box.
Main XML result tags
Tag | Description |
<XMLResult> |
This is the root tag. Its attributes contain the following information:
|
<InputFile> |
The attributes of this tag provide general information about the input file:
The tags embedded into <InputFile> contain the following information:
Note. If an output file is obtained by copying an input file rather than by performing OCR, this is indicated by means of an <OutputDocuments> tag inside the corresponding <InputFile> tag. Note. If an error occurs when processing an input image file, an <Error> tag will be put inside the <InputFile> tag. The <Error> tag will contain a description of the error. |
<JobDocument> |
This tag provides information about the document which was obtained by grouping together the processed input files. Depending on your document assembly settings, this document may consist of only one input file, of multiple input files, or of certain pages taken from multiple input files. The number of <OutputDocuments> tags equals the number of output files processed within the given job. The tags embedded into <OutputDocuments> contain the following information about the output files:
The following tags are used when processing e-mail messages. They show whether the document contains the body of an e-mail message or an attachment and whether it was obtained from a main message or from an attached message. The tags are used for e-mail files, which may either be received by e-mail or taken from the Input folder.
Note. If one document is created for the job which will contain all texts from an e-mail message and its attachments, then IsMailBodyFile = true, IsMailAttachedMessageFile= false. Note. If the document contains only the text of a file attached to a main message, then IsMailBodyFile = false, IsMailAttachedMessageFile= false. Note. The IsMailBodyFile and IsMailAttachedMessageFile properties are included in the XML result file if their values are not false. |
<ImageProcessingParams> |
The attributes of this tag contain information about the additional image processing settings, for example:
|
<RecognitionParams> |
The attributes and embedded tags of <RecognitionParams> contain the OCR settings, for example:
|
<ExportParams> |
The tags embedded into <ExportParams> contain the export parameters:
|
<Statistics> | The attributes of this tag contain the general statistics for the processed files, which combine the statistics for all executed jobs. |
XML result change log
- An Id attribute has been added to <InputFile>, which contains the identifier of the input file.
- An embedded <Page> tag has been added to <InputFile>, where Id is the identifier of the page of the input document and PageNumber is the number of the page in the input file.
- An embedded <Pages> tag has been added to <JobDocument>. <Pages> has <FileId> and <PageId> tags embedded into it. <FileId> is the identifier of the input file and <PageId> is the identifier of the page of the input document from which the processed page was obtained.
The above changes have been introduced so that users can easily see which page of the input file corresponds to which page of the output file.
- A RewriteIfFileExists attribute has been added to <FormatSettings>. If this attribute is set to true, this means that any output files found in the Output folder have been overwritten.
- A SkipRecognizePdfsWithTextLayer attribute has been added to <ExportFormat>. If this attribute is set to true, the Do not modify files with high quality text layer option is enabled on the PDF dialog box.
- A SkipRecognizePdfsWithTextLayerCoefficient attribute has been added to <ExportFormat>. This attribute lists the settings that determined how the program checked the quality of the text layer in input PDF files.
- A ProhibitHiddenTextDetection attribute has been added to <RecognitionParams>. This attribute is set to true by default, which means that text in pictures in input PDF files is not recognized. If the attribute is set to false, the program recognizes text in pictures and creates a text layer in the output document.
- A TiffMaxBrokenLastLinesCount attribute has been added to the <ImageProcessingParams> element. This attribute indicates whether the processing of damaged TIFF files was enabled or disabled.
- An EnablePeriodicCrawling attribute has been added to <InputSettings>. If this attribute is set to true, the Crawl the library for new files every: N units of time option is enabled on the 1. Input Tab of Workflow Properties dialog box. The default value is false.
- A CrawlingInterval attribute has been added to <InputSettings>. The value of this attribute indicates how often the program checks for new files in Document Library workflows.
- A <BackgroundColorDetectionParams> parameter with a BackgroundColorDetectionType attribute has been added to the <ImageProcessingParams> tag. This attribute indicates which color was used to fill blank areas that appeared after image were deskewed. The default value is Auto.
- An IndexingStationPagesSlice attribute has been added to <IndexingSettings>. Pages of mutli-page documents arrive at the Indexing Station in sets containing a specific number of pages. The value of this attribute indicates the number of pages per set. The default value is 5.
See also
26.03.2024 13:49:49