A Generalized Geometric Theory of Centroid Discriminant Analysis for Linear Classification of Multi-Dimensional Data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Linear classifiers are preferred for some tasks because they overfit less and provide interpretable decision boundaries. Yet, achieving both scalability and predictive performance remains challenging. Here, we propose a theoretical framework named geometric discriminant analysis (GDA). GDA includes the family of linear classifiers that can be expressed as function of a centroid discriminant basis (CDB0) - the connection line between two centroids - adjusted by geometric corrections under different constraints. We demonstrate that linear discriminant analysis (LDA) is a subcase of the GDA theory, and we show its convergence to CDB0 under certain conditions. Then, based on the GDA framework, we propose an efficient linear classifier named centroid discriminant analysis (CDA) which is defined as a special case of GDA under a two-dimensional (2D) plane geometric constraint. CDA training is initialized starting from CDB0 and involves the iterative calculation of new adjusted centroid discriminant lines whose optimal rotations on the associated 2D planes are searched via Bayesian optimization. CDA has good scalability (quadratic time complexity) which is lower than LDA and support vectors machine (SVM) (cubic complexity). Results on 27 real datasets across classification tasks of standard images, medical images and chemical properties, offer empirical evidence that CDA outperforms other linear methods such as LDA, SVM and fast SVM in terms of scalability, performance and stability. CDA competes with the state-of-the-art method ResNet for tasks such as medical imaging adrenal gland disease classification, exhibiting less tendency to overfit data. GDA theory may inspire new linear classifiers under the definition of different geometric constraints.