Tips for improving classification quality

If for some reason a classifier produces unsatisfactory results for a batch of documents, try the following:

  • Check that you have correctly adjusted the precision-recall slider;
  • Increase the amount of documents. The bigger your selection of documents, the more different documents of the same class will be recognized by the classifier;
  • Create additional rules for better differentiation between classes.

To improve the classification quality, do the following:

  • Check that you have correctly set the desired precision and recall values;
  • Add more relevant documents to the training set. This will allow for more precise class attributes and for an optimization of the classification algorithm, and as a result, an improvement in the quality of the trained classifier;
  • Review the incorrectly classified documents in the yellow cells of the table in the Confusion matrix tab (right-click to view them).
    If the meaning of the text and the selected attributes make it obvious that the reference class was assigned incorrectly, assign the correct one instead. If it is not possible to precisely determine the document's reference class,  remove it from the training batch;
  • It is possible that the training batch contains thematically similar classes, differentiating between which might be challenging even for human experts. Combine similar classes into one;
  • Create additional rules to make differentiating between classes easier.

