XML Result

An XML result file contains information about the parameters and results of job execution. An XML result file is created for both successful and failed jobs.

On the 6. Output tab of the Workflow Properties dialog box, you can specify a folder where the XML result file should be published.

Important! The name of the XML result file must not contain more than 64 characters.

If some errors occur when executing a job and the job is marked as Failed, the XML result file and unprocessed image files will be saved as specified under Save failed jobs to on the 4. Quality Control tab of the Workflow Properties dialog box.

Main XML result tags

Tag Description
<XMLResult>

This is the root tag. Its attributes contain the following information:

  • Id - the identifier of the job
  • IsFailed - indicates whether the job failed or not
  • Priority - indicates the priority of the job
  • Date - the date and time when the job was accepted for processing
<InputFile>

The attributes of this tag provide general information about the input file:

  • Name - the name of the file
  • Id - the identifier of the file
  • FileModificationTime - the date and time when the file was created

The tags embedded into <InputFile> contain the following information:

  • <Statistics> - file statistics (total number of characters, number of characters recognized with low confidence, number of pages)
  • <Page> - the attributes of this tag establish a correspondence between input and output files (Id is the identifier of the page, PageNumber is the number of the page in the file)

Note. If an output file is obtained by copying an input file rather than by performing OCR, this is indicated by means of an <OutputDocuments> tag inside the corresponding <InputFile> tag.

Note. If an error occurs when processing an input image file, an <Error> tag will be put inside the <InputFile> tag. The <Error> tag will contain a description of the error.

<JobDocument>

This tag provides information about the document which was obtained by grouping together the processed input files. Depending on your document assembly settings, this document may consist of only one input file, of multiple input files, or of certain pages taken from multiple input files.

The number of <OutputDocuments> tags equals the number of output files processed within the given job.

The tags embedded into <OutputDocuments> contain the following information about the output files:

  • <FileName> - the name of the file
  • <FormatSettings> - the attributes of this tag contain the export settings
  • <OutputLocation> - the path to the Output folder
  • <NamingRule> - the file naming rule
  • <Pages> - information about the current page of the output file (a tag corresponds to each page of the output file; the tags embedded into <Pages> indicate the page and file from which the current page has been obtained)
  • <Statistics> - job statistics

The following tags are used when processing e-mail messages. They show whether the document contains the body of an e-mail message or an attachment and whether it was obtained from a main message or from an attached message. The tags are used for e-mail files, which may either be received by e-mail or taken from the Input folder.

  • <IsMailBodyFile> - indicates if the document contains the body of a main e-mail message
  • <IsMailAttachedMessageFile> - indicates that the document contains  the body of an attached e-mail message or its attachment.

Note. If one document is created for the job which will contain all texts from an e-mail message and its attachments, then  IsMailBodyFile = true, IsMailAttachedMessageFile= false.

Note. If the document contains only the text of a file attached to a main message, then IsMailBodyFile = false, IsMailAttachedMessageFile= false.

Note. The IsMailBodyFile and IsMailAttachedMessageFile properties are included in the XML result file if their values are not false.

<ImageProcessingParams>

The attributes of this tag contain information about the additional image processing settings, for example:

  • SplitDualPages - splitting of facing pages
  • ConvertToBWFormat - conversion of color and grayscale images to black-and-white
  • RotationType - image rotation
  • Deskew - correction of image skew
  • RemoveTexture - removal of background noise, etc.
<RecognitionParams>

The attributes and embedded tags of <RecognitionParams> contain the OCR settings, for example:

  • <Language> - the OCR languages
  • RecognitionQuality - the optimization method (prefer quality over speed or vice versa)
  • RecognitionMode - the recognition mode (recognize all text or recognize only barcodes)
<ExportParams>

The tags embedded into <ExportParams> contain the export parameters:

  • <ExportFormat> - the output formats of the processed images and their parameters
  • <OutputLocation> - the output destination
  • <XMLResultLocation> - the folder where XML result files are published
  • <NamingRule> - the file naming rule
<Statistics> The attributes of this tag contain the general statistics for the processed files, which combine the statistics for all executed jobs.

XML result change log

  • An Id attribute has been added to <InputFile>, which contains the identifier of the input file.
  • An embedded <Page> tag has been added to <InputFile>, where Id is the identifier of the page of the input document and PageNumber is the number of the page in the input file.
  • An embedded <Pages> tag has been added to <JobDocument>. <Pages> has <FileId> and <PageId> tags embedded into it. <FileId> is the identifier of the input file and <PageId> is the identifier of the page of the input document from which the processed page was obtained.

The above changes have been introduced so that users can easily see which page of the input file corresponds to which page of the output file.

  • A RewriteIfFileExists attribute has been added to <FormatSettings>. If this attribute is set to true, this means that any output files found in the Output folder have been overwritten.
  • A SkipRecognizePdfsWithTextLayer attribute has been added to <ExportFormat>. If this attribute is set to true, the Do not modify files with high quality text layer option is enabled on the PDF dialog box.
  • A SkipRecognizePdfsWithTextLayerCoefficient attribute has been added to <ExportFormat>. This attribute lists the settings that determined how the program checked the quality of the text layer in input PDF files.
  • A ProhibitHiddenTextDetection attribute has been added to <RecognitionParams>. This attribute is set to true by default, which means that text in pictures in input PDF files is not recognized. If the attribute is set to false, the program recognizes text in pictures and creates a text layer in the output document.
  • A TiffMaxBrokenLastLinesCount attribute has been added to the <ImageProcessingParams> element. This attribute indicates whether the processing of damaged TIFF files was enabled or disabled.
  • An EnablePeriodicCrawling attribute has been added to <InputSettings>. If this attribute is set to true, the Crawl the library for new files every: N units of time option is enabled on the 1. Input Tab of Workflow Properties dialog box. The default value is false.
  • A CrawlingInterval attribute has been added to <InputSettings>. The value of this attribute indicates how often the program checks for new files in Document Library workflows.
  • A <BackgroundColorDetectionParams> parameter with a BackgroundColorDetectionType attribute has been added to the <ImageProcessingParams> tag. This attribute indicates which color was used to fill blank areas that appeared after image were deskewed. The default value is Auto.
  • An IndexingStationPagesSlice attribute has been added to <IndexingSettings>. Pages of mutli-page documents arrive at the Indexing Station in sets containing a specific number of pages. The value of this attribute indicates the number of pages per set. The default value is 5.
See also

Architecture of ABBYY FineReader Server

XML Ticket

26.03.2024 13:49:49

Please leave your feedback about this article

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.