One of the main recognition parameters is the language to be used for recognition. The recognition languages can be specified for the entire document and for individual data fields.
Recognition language for the entire document
The recognition language for the entire document is specified in the Document Definition. To specify the recognition language for a document, use the IDocumentDefinition::DefaultLanguage property.
The recognition language which is used for the entire document is usually one of the predefined languages, or a group of such languages. Predefined languages are the languages supported in ABBYY FlexiCapture SDK by default. The collection of available predefined languages represented by the PredefinedLanguages object is accessible via the PredefinedLanguages property of the Engine object. It is a collection of Language objects.
To change the language used for recognition of a document, do the following:
- Find the necessary predefined language in the collection of available languages. Use the FindLanguage method of the PredefinedLanguages subobject of the Engine object. The predefined languages are identified by their internal names. For the list of the internal names of the predefined languages see Predefined Languages in ABBYY FlexiCapture SDK.
- Assign the received Language object to the IDocumentDefinition::DefaultLanguage property.
Languages may be simple (e.g. English, Russian), or compound (composed of several simple languages). See the Type property of the Language object. The Language object exposes methods which cast it to a type-specific interface and thereby provide access to the extended attributes of a language of specific type.
Changing the recognition language for individual fields
The recognition language can be changed for an individual field by setting the recognition parameters of this field. The IFieldDefinition::RecognitionParams property provides access to the parameters of field recognition. The recognition language can be set for text fields only (IFieldDefinition::Type = FT_TextField).
To tune the recognition parameters of a text field, receive the RecognitionParams object as a TextRecognitionParams object. (Use the AsTextParams method of the RecognitionParams object.)
To use a group of languages for recognition...
- Create a group language with the CreateEmbeddedLanguage method of the TextRecognitionParams object. Pass an LT_Group constant as the first parameter of this method.
- Obtain the created object as a GroupLanguage object using the AsGroupLanguage method of the Language object.
- Add languages to this group language. Use the Add method of the GroupLanguage object.
- Assign the created language to the Language property of the TextRecognitionParams object.
C++ (COM)
To use regular expressions for recognition...
- Create a simple language with the CreateEmbeddedLanguage method of the TextRecognitionParams object. Pass an LT_Simple constant as the first parameter of this method.
- Obtain the created object as a SimpleLanguage object using the AsSimpleLanguage method of the Language object.
- The created simple language is empty. You must specify its alphabet. Pass the set of alphabet letters to the LetterSet property of the SimpleLanguage object with an LLS_Alphabet constant as a parameter.
- Set the regular expression via the RegularExpression property of the SimpleLanguage object. The semantics of ABBYY FlexiCapture regular expressions is described in Working with Regular Expressions.
- Assign the created language to the Language property of the TextRecognitionParams object.
C++ (COM)
To use user dictionaries for recognition...
- Create a simple language with the CreateEmbeddedLanguage method of the TextRecognitionParams object. Pass an LT_Simple constant as the first parameter of this method. You may create this language on the basis of a predefined language. If you do not use any predefined language as a prototype, you must then set the alphabet of the language.
- Obtain the created object as a SimpleLanguage object using the AsSimpleLanguage method of the Language object.
- Set the UseUserDefinedDictionary property of the SimpleLanguage object to TRUE. You may use both a user dictionary and a predefined dictionary for recognition. If you want to use only a user dictionary, set the UsePredefinedDictionary property to FALSE.
- Receive the user dictionary via the UserDefinedDictionary property of the SimpleLanguage object. The user dictionary is represented by the Dictionary object. With the help of methods and properties of this object you may add words to the dictionary, delete words from it, and load an existing dictionary via the provided user interface.
- Assign the created language to the Language property of the TextRecognitionParams object.
C++ (COM)
To use a special recognition language for a data type...
It is not always convenient to specify the type of field value (IFieldDefinition::ValueType) as this restricts field values. E.g. if a user enters incorrect data into such field, this will produce an error. To avoid specifying some special type of field value, you may:
- Create a simple language with the CreateEmbeddedLanguageByDataType method of the TextRecognitionParams object. Pass a FieldValueTypeEnum constant as the parameter of this method.
- Set the necessary parameters of the created language.
- Assign the created language to the Language property of the TextRecognitionParams object.
C++ (COM)
See also
Language
PredefinedLanguages