How to Process Office Documents
ABBYY FineReader Server allows you to process Office documents (i.e. DOC, DOCX, XLS, XLSX, PPT, PPTX, etc.) in the same workflow as image files.
Processing Office documents in ABBYY FineReader Server
By default, Office documents are processed by the Support for Office File Formats component, which can be installed together with the Processing Station component.
In ABBYY FineReader Server, you can also process Office documents using a script that will convert them to image files, or using the API.
Processing Office documents using Microsoft Office or LibreOffice
For best results, you can process Office documents using a third-party application installed on the same computer as ABBYY FineReader Server. ABBYY FineReader Server supports integration with Microsoft Office 2016 or later and LibreOffice 4 or later.
Note. To ensure proper integration with ABBYY FineReader Server, install LibreOffice to the default folder used by the LibreOffice installer on system disk.
Note. In some cases, Office document conversion may fail due to problems with certain versions of LibreOffice (see the ABBYY knowledge base for details).
To set up the processing of Office documents in ABBYY FineReader Server, complete the following steps:
- Select the Preprocessing or Processing and Preprocessing role for one or more Processing Stations. Microsoft Office or LibreOffice must be installed on computers running the Processing Stations with these roles.
- Open 2. Process tab of the Workflow Properties dialog box. In the Office documents processing mode drop-down list select a program MS Office or LibreOffice for opening Office documents.
- Specify the credentials for the user (login and password) account under which the program will run.
- If the FineReader Server Processing Station service is running under the Local System account and Microsoft Office 2013, 2016 or 2019 is to be used, specify the login and password of the user account.
- If the FineReader Server Processing Station service is running under a user account and Microsoft Office 2013 is to be used, no login and password is required.
- If the FineReader Server Processing Station service is running under a user account and Microsoft Office 2016 or 2019 is to be used, restart the service under the Local System account and specify the login and password of the user account.
Note. Files in the following formats will be processed by default: DOC, DOCX, ODT, HTM, HTML, TXT, RTF, XLS, XLSX, ODS, PPT, PPTX, ODP.
Note. The list of supported formats can be changed. To change the list, in the configuration file (% PROGRAMDATA%\ABBYY FineReader Server 14\Configuration.xml), specify the set of file extensions required for the program you selected for opening Office documents (see WordFilesMask, ExcelFilesMask, PowerPointFilesMask, VisioFilesMask attributes of the \OnFileReceivedCustomOffice\ CustomOfficeConversionParams\CustomOfficeApplications tag). For example, if you specify WordFilesMask="*.doc;*.docx;", then DOC and DOCX files will be opened using Microsoft Office Word or LibreOffice Writer (depending on which program you selected), but no program for processing RTF files will be found.
Note. Office documents will not be read by a third-party program if:
- The third-party program is unlicensed or its license has not been activated.
- The third-party program is not the default program for opening Office documents.
Copying electronic documents to the Output folder
Electronic documents can be copied to the output folder without conversion and recognition, so that the page counter for your license will not decrement. The following conditions must be met:
- Input files must have any of the following file extensions: *.doc, *.docx, *.xls, *.xlsx, *.rtf, or *.txt.
- The format of each output file must be the same as the format of the corresponding input file and input files must not exported to any other formats.
- On the 3. Document Separation tab:
- The Create one document for each file in job option must be selected
- The Delete blank pages option must not be selected
- On the 4. Quality Control tab, the No verification option must be selected.
- On the 5. Indexing tab, no document types must be specified.
Note. If all of the above conditions are met with the exception of the last one, i.e. if one or more document types are specified on the 5. Indexing tab, the following operations will be performed:
- Input files will be converted, recognized, and indexed.
- The page counter for your license will be decremented by the corresponding number of pages.
- The input files with the attributes assigned to them by indexing will be copied to the Output folder.
It is possible to have output and input files in the same formats but with different format settings (e.g. with different page sizes specified for each). By default, the output format settings will be ignored and the input file will be copied to the Output folder as it is. If these input files also need to be processed, modify the XML ticket for the appropriate workflow accordingly. To do so, export the workflow, specify IsExportSrcEdocAllowed=false in the export parameters in the XML file, and import the XML ticket to FineReader Server. For more information, see How to Modify Workflow Settings in an XML Ticket.