使用投票 API
Note: Some parts of this article may be in English. We apologize for the inconvenience and are working on adding the translation as soon as possible.
开发人员可以在识别解决方案中结合使用多个 Engine。当多个 Engine 为一个字符或者词汇生成不同的识别变体时,开发人员可以通过在变体之间进行投票来选择最佳变体。为实现投票功能,ABBYY FineReader Engine 带有一个特殊的投票 API,能为有相应权重值的字符或者词汇识别提供不同的假设。开发人员还可以通过投票 API 来用自己的数据库和算法检查识别结果并纠正文本。例如,开发人员可以从字母构建词汇或者检查所有生成的假设。
重要事项!投票API不可用于手写文本识别。
WordRecognitionVariants 对象代表一个词汇的假设集合,CharacterRecognitionVariants 对象代表一个字符的假设集合。这些集合的元素分别是 WordRecognitionVariant 和 CharacterRecognitionVariant 对象。
WordRecognitionVariant 对象代表一个词汇的单个假设,包含假设的文本、模型类型、笔画的平均宽度、以及是否在词典中已找到假设的信息。可通过该对象的 GetCharParams 方法访问单个字符参数。
CharacterRecognitionVariant 对象代表一个字符的单个假设,包含字符置信度、字符用宋体所写的概率,以及字符是上标还是下标的信息。
如何检索一个词汇或者字符的识别变体
如需找出一个词汇或者字符的所有识别假设,请执行以下操作:
- 设置 RecognizerParams 对象的 SaveWordRecognitionVariants 和 SaveCharacterRecognitionVariants 属性为 TRUE。这一操作旨在使 FineReader Engine 将识别变体保存在识别结果中。
- 将 RecognizerParams 对象作为 PageProcessingParams 对象的子对象(或者包含PageProcessingParams 对象的 DocumentProcessingParams 对象) 传递到一种 ABBYY FineReader Engine 识别方法。
- 假设集合可以在识别后通过 ICharParams::WordRecognitionVariants, ICharParams::CharacterRecognitionVariants 属性和 IParagraph::GetWordRecognitionVariants 方法进行访问。
注: 这些方法将为不可打印字符(空格、回车符等等)和编辑期间未获识别但是被添加到文本的字符返回零。如果文本由之前的一个 ABBYY FineReader Engine 版本识别也会返回零。假设集合包含从最佳到最差排序的识别变体。如果 RecognizerParams 对象的 SaveWordRecognitionVariants 或 SaveCharacterRecognitionVariants 属性被设为 FALSE,相应的集合将只包含一个元素。
C++ (COM) 代码
C# 代码
What is the difference between CharConfidence, ErrorProbability, and IsSuspicious
To find out if a character was recognized unreliably and needs verification, use the IsSuspicious property of the PlainText or CharParams objects for the given character. It is calculated based on ErrorProbability.
To implement more fine-grained distinctions, you can use the ErrorProbability property of the PlainText or CharParams objects, which returns the estimated probability (in range from 0 to 100) that the character was recognized incorrectly. It takes into account the context in which the character is found, for example, whether the word with this character is found in the dictionary.
You can also get the character confidence, both for a recognized character (the CharConfidence property of the PlainText object) and for all recognition variants (the CharConfidence property of the CharacterRecognitionVariant object). The confidence provides an accuracy estimate based only on the image of one character, without considering the context. Confidence estimates for different characters are incomparable, and the only safe use of confidence is for comparing several recognition variants of the same image (character).
None of these properties make sense for symbols obtained without recognition, for example, those taken directly from the source PDF file.
另请参阅
07.11.2025 12:48:30