The Fraction-product: A Novel Discriminant Statistic for Binary Classification
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background This paper characterizes the fraction-product as a novel discriminant statistic, which we have found to be extremely useful in feature selection on spectroscopic data. In supervised binary classification, the fraction-product measures the amount of taxonomic information in each attribute. The simplicity of the idea facilitates its adaptation to different data sets and, in some settings, leads to new, useful measures. After a discussion of its mathematical foundation, it is applied as a worked example to the Diagnostic Wisconsin Breast Cancer Database. Results The analysis of non-spectroscopic data suggests the utility of another new measure which is called taxonomic potential. Given two attributes, the taxonomic potential measures the potential for one feature to have taxonomic information that is not explained by its correlation with the other feature. The fraction-product and taxonomic potential allow the rapid selection of four features which, after weighting with linear discriminant analysis, lead to accuracy = 97.9%; recall = 1.0; precision = 92.2%. Moreover, the three major features are stable with respect to variations of the training set. Conclusions The fraction-product is a new discriminant statistic that has been useful in supervised, binary classification in two very different data sets: spectra and geometric measures of cell nuclei. It is simple and can be easily adapted to unique features of the data for the best outcomes.