Clustering for ranking multivariate data by Linear Ordered Partitions

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper illustrates how clustering can be used to rank a multivariate set of observations. The link between ranking and clustering is provided by the Linear Ordered Partitions (LOP) and corresponds to detecting the optimal clustering of the multivariate units in ordered equivalence classes. The ranking of clusters provides additional information with respect to the simple units’ ordering since it identifies classes in which observed units are considered “incomparable”. The goal, in this ranking framework, is to perform partitioning of units by considering the largest number of clusters such that their centroids, representing within cluster incomparable units, are statistically different from each other, with clusters optimally ranked. Thus, the final result is a ranking (total order) of clusters, where units within clusters are believed “ties”, i.e., incomparable. In this paper, we propose a model that identifies the best Least-Squares (LS) units’ Linear Ordering Partition, together with a univariate linear transformation of the observed variables needed to detect that LOP. Therefore, the model simultaneously identifies the optimal LS LOP of units associated to the LS orthogonal projection of the observed multivariate units on a straight line. This projection statistically represents a composite indicator that synthesizes the observed variables. The theoretical properties of the proposed model are fully discussed, and an extended simulation study with 5400 generated data sets is given to show its performances under different scenarios. An application is presented to deeply understand the potentiality of this approach to data analysis. A final discussion on future developments and some conclusions are made available.

Article activity feed