DCMatchBoosted - Improving Deep Clustering by Architecture Recommendation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deep clustering algorithms such as Deep Embedded Clustering (DEC), Deep K-Means (DKM), and Deep Clustering Network (DCN) are highly sensitive to the architecture of the underlying neural network, heavily influencing clustering quality. Although Neural Architecture Search (NAS) methods aim at properly configuring deep neural networks, traditional NAS approaches are unsuitable in this context due to the absence of labels. We propose \textbf{DCMatchBoosted}, an extension of our previous framework (DCMatch), which leverages dataset characterization and a gradient boosting surrogate to recommend effective autoencoder architectures for deep clustering. Our method combines high-level semantic embeddings from CLIP with complementary statistical descriptors, extracted from a small subset of randomly sampled images, to build a compact representation of each dataset. These dataset features are paired with architecture metadata and used to train an XGBoost model that predicts clustering performance. In extensive experiments on 20 image datasets and three clustering algorithms (DEC, DKM, DCN), DCMatchBoosted consistently outperforms the default configurations, achieving statistically significant improvements in clustering accuracy on the majority of datasets. We make our code available here: https://github.com/mamdouhJ/DCMatch

Article activity feed