Precision/recall balance

Cases where a classifier fails to classify a document correctly fall into two categories:

  1. The classifier assigns the wrong class to the document, e.g. a type A page is classified as type B.
  2. The classifier fails to assign any class to the document.

Two parameters derived from these categories can be used to describe classification quality:

  • Precision is the ratio of documents that were assigned a specific class correctly to all of the documents that were assigned that class (i.e. the sum of documents that were correctly assigned that class and documents that were mistakenly assigned that class).

  • Recall is the ratio of documents that were correctly assigned a class to all of the documents of that class.

You can adjust classification settings to prioritize recall or precision.

Prioritizing precision

If you want the number of documents that receive the wrong class to be as low as possible at the expense of some documents not receiving any class at all, use the High precision setting.

Example

A company needs to classify invoices and contracts so that they can be sent to departments responsible for handling each type of document.

If FlexiCapture classifies an invoice incorrectly, the invoice will not be sent to the right department and will not be paid. If FlexiCapture does not classify the invoice at all, the invoice can be classified manually and sent to the right department.

In this case, avoiding incorrect classification is more important than classifying as much documents as possible.

Prioritizing recall

If you want the number of documents that are not assigned any class to be as low as possible at the expense of some documents being assigned the wrong class, use the High recall setting.

Example

A company needs to identify relevant loan documents that need to be processed among various other loan documents.

If FlexiCapture does not assign a class to a relevant document, it will not be processed.

Processing of documents that were assigned the wrong can be avoided using additional processing, e.g. by applying a FlexiLayout, using validation rules and using manual processing.

In this case, it is more important to assign a class to as much relevant documents as possible than to assign the correct class as frequently as possible.

By default, Classification priority is set to balanced.

12/1/2020 7:03:59 AM


Please leave your feedback about this article