A Statistical Approach to the Confusion Matrix for Classification Problems Using Machine Learning

Rafael Sanchez-Marquez
Jose Jabaloyes Vivas

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The main contribution is to provide machine learning and quality practitioners with a complete and practical method to estimate the lower bound of the intrinsic kappa coefficient and accuracy. Kappa statistic is one of the most used methods to evaluate the effectiveness of quality inspections based on attributive characteristics. Kappa and accuracy are also extensively used for classification problems in machine learning. This article develops exact and approximate methods to estimate the lower bound of kappa's "intrinsic" value for any number of categories. In addition, two methods (exact and approximate) are provided to estimate the accuracy lower bound for machine learning practitioners who prefer this performance metric. For the intrinsic kappa coefficient and accuracy, the results showed that the approximate methods’ estimations are very close to those from the exact method for a wide range of sample sizes and misclassified instances, indicating that the approximate can be used for any number of categories. Additionally, real-life examples illustrate the use of the method for practitioners.

Version published to 10.20944/preprints202506.0273.v1
Jun 4, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed