Extending the Bicriterion Approach for Anticlustering: Exact and Hybrid Approaches
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Numerous applications in psychological research require that a data set is partitioned via the inverse of a clustering criterion. This anticlustering seeks for high similarity between groups (maximum diversity) or high pairwise dissimilarity within groups (maximum dispersion). Brusco et al. (2020) proposed a bicriterion heuristic (BILS) that simultaneously seeks for maximum diversity and dispersion, introducing the bicriterion approach for anticlustering. We investigate if the bicriterion approach can be improved using exact algorithms that guarantee globally optimal criterion values. Despite the theoretical computational intractability of anticlustering, we present a new exact algorithm for maximum dispersion that scales to quite large data sets (N = 1000). However, a fully exact bicriterion approach was only feasible for small data sets (about N = 30). We therefore developed hybrid approaches that maintain optimal dispersion but use heuristics to maximize diversity on top of it. In a simulation study and an example application, we compared several hybrid approaches. An adaptation of BILS that initiates each iteration with a partition having optimal dispersion (BILS-Hybrid-All) performed best across a variety of data inputs. All of the methods developed here as well as the original BILS algorithm are available via the free and open source R package anticlust.