Classifier statistics
After the training has been carried out, the classification results are used to form a statistics report.
- If the class assigned to a page corresponds to a reference class, this is classified as a True Positive (TP) trigger;
- If no class was assigned to a page with no reference class, this is classified as a True Negative (TN) trigger;
- If the class assigned to a page does not correspond to a reference class, this is classified as a False Positive (FP) trigger;
- If no class was assigned to a page that has a reference class, this is classified as a False Negative (FN) trigger.
As such, a tally is formed for each class, documenting the number of times it was:
- correctly assigned (TP);
- correctly not assigned (TN);
- incorrectly assigned (FP);
- incorrectly not assigned (FN).
To view the statistics, select Classifier > Show statistics.
The higher the precision, recall, and the F-measure, the better the classification results are. (For more details about how the F-measure is calculated, see Glossary). The F-measure is a balanced measurement of both precision and recall and allows for a cumulative evaluation of the classification quality using these parameters. For more information about increasing the F-measure, see the Tips for improving classification quality section.
For quality evaluation purposes, further statistics are also available in the following tabs:
- Confusion Matrix. The confusion matrix is a visual representation of which documents are most often confused by a classifier. The cells in the diagonal of the matrix show how many of the documents were classified correctly. The right column and the last row contain information about documents that were not assigned any class. The rest of the cells show the documents that were incorrectly classified;
- Confusing Classes. This tab contains a list of classes that were mixed up by the classifier. This statistic can help you figure out which classes are most often confused for each other;
- Statistics by Class. This shows a detailed statistic for each class and lets you identify the classes that cause the classifier to make the most mistakes.
12.04.2024 18:16:02