Recognizing with Training
ABBYY FineReader Engine can read texts set in practically any font regardless of print quality. Consequently, no prior training is normally required before recognition can take place. Nevertheless, ABBYY FineReader Engine features a number of user pattern training tools for special cases.
Pattern training works as follows. One or two pages are recognized in training mode, with the user entering the correct symbol values. These data are used to create a pattern. A pattern is a set of pairs "a character image — the character itself" created during pattern training. A pattern is used as a source of additional information during recognition. ABBYY FineReader Engine then uses this pattern for the recognition of the remaining text.
Sometimes two or even three characters may get "stuck" together, and ABBYY FineReader Engine may be unable to enclose each character in an individual frame to separate them. If this proves to be the case (i.e., you cannot move the frame so that it contains only one whole character and no other character parts), you can train ABBYY FineReader Engine to recognize the inseparable character combination as a whole. Examples of character combinations frequently found stuck together include ff, fi, and fl. Such combinations are referred to as ligatures.
You can find additional information in Training User Patterns.
When to use
Train User Pattern mode may come in useful when:
- recognizing texts set in decorative fonts
- recognizing texts containing unusual characters (e.g., mathematical symbols)
- recognizing large volumes (more than a hundred pages) of texts of low print quality
Use Train User Pattern mode only if one of the above applies. In other cases, you may obtain a slight increase in recognition quality, but the time and effort involved will probably outweigh the benefit received.
- A pattern is only useful in the case of documents that have the same font, font size, and resolution as the document used to create the user pattern.
- Pattern training is not supported for CJK languages. If one of these languages is selected for recognition, all user patterns (including those for other languages) are ignored.
- Pattern training cannot be performed when recognizing in parallel processes.
- Pattern training should be performed on pages with the correct page orientation because automatic page orientation detection does not work in this case.
How to recognize with training
- Create a RecognizerParams object.
- Set the IRecognizerParams::TrainUserPatterns property to TRUE.
- Create an empty user pattern file by using the IEngine::CreateEmptyUserPattern method.
- Specify the full path to this user pattern file in the IRecognizerParams::UserPatternsFile property.
- Call a recognition method (e.g., IFRDocument::Process) with these recognition parameters. Whenever an unknown character is encountered, the Pattern Training dialog will open, with the character image displayed within it.
- Train your pattern — recognize one or more pages in Train User Pattern mode. Trained characters are saved in the user pattern file.
- [Optional] If you wish to edit this pattern, call the EditUserPattern method of the Engine object.
- Recognize the images by using this pattern.
Note: If the IRecognizerParams::UseBuiltInPatterns property is set to TRUE, then ABBYY FineReader Engine will use its own built-in patterns for recognition. Set this property to FALSE when you do not want to use the standard ABBYY FineReader Engine patterns for character recognition. This may be useful for the recognition of texts typed in decorative or non-standard fonts, in which case you can use your own user-defined patterns trained specifically for these fonts. If the UserPatternsFile property (where the path to the user-defined pattern file is stored) is empty, the UseBuiltInPatterns property is ignored.
C++ (COM) code
FREngine::IEnginePtr Engine; FREngine::IFRDocumentPtr frDocument; ... // Create a DocumentProcessingParams object FREngine::IDocumentProcessingParamsPtr dpp = Engine->CreateDocumentProcessingParams(); // Set the TrainUserPatterns property dpp->PageProcessingParams->RecognizerParams->TrainUserPatterns = VARIANT_TRUE; // Create an empty user pattern file Engine->CreateEmptyUserPattern( L"D:\\test.ptn" ); // Set the full path to the user pattern file dpp->PageProcessingParams->RecognizerParams->UserPatternsFile = L"D:\\test.ptn"; // Process the image frDocument->Process( dpp ); ...
FREngine.IEngine engine; FREngine.IFRDocument frdoc; ... // Create a DocumentProcessingParams object FREngine.IDocumentProcessingParams dpp = engine.CreateDocumentProcessingParams(); // Set the TrainUserPatterns property dpp.PageProcessingParams.RecognizerParams.TrainUserPatterns = true; // Create an empty user pattern file string patternFile = "D:\\test.ptn"; engine.CreateEmptyUserPattern( patternFile ); // Set the full path to the user pattern file dpp.PageProcessingParams.RecognizerParams.UserPatternsFile = patternFile; // Process the image frdoc.Process( dpp ); ...