Large-scale Clustering via Fast Splitting of a Sparse Representative Tree Based on Local Density

Renmin Wang
Jie Li

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large-scale clustering remains an active yet challenging task in data mining and machine learning, where existing algorithms often struggle to balance efficiency, accuracy, and adaptability. This paper proposes a novel large-scale clustering framework with three key innovations: (1)Parameter-free cluster discovery: unlike conventional methods requiring predefined cluster numbers, our algorithm autonomously identifies natural cluster structures through dynamic density-based splitting decisions.(2)Hybrid sampling-partitioning strategy: by integrating randomized sampling with K-means-based partitioning, we extract high-quality representative points that preserve data integrity with linear computational complexity.(3)Local density-driven MST segmentation: A minimum spanning tree (MST) constructed from representatives is adaptively partitioned using a local density criterion, which dynamically disconnects weakly associated edges by comparing density peaks between adjacent representative points. Extensive experiments on synthetic and real-world data sets (up to 20 million samples) demonstrate the algorithm's superiority: it achieves higher clustering accuracy than state-of-the-art methods while reducing runtime. Notably, the framework exhibits remarkable robustness to sampling ratios and eliminates dependency on user-specified parameters, making it ideal for real-world applications with complex, arbitrary-shaped data distributions.

Version published to 10.21203/rs.3.rs-6746982/v1 on Research Square
Jun 16, 2025

Improving K-Means Clustering: A Comparative Study of Parallelized Version of Modified K-Means Algorithm for Clustering of Satellite Images

This article has 3 authors:
1. Yuv Raj Pant
2. Larry Leigh
3. Juliana Fajardo Rueda
This article has no evaluationsLatest version Jun 12, 2025
Mining Spatial Co-location Patterns via γ-Quasi-Clique Detection

This article has 4 authors:
1. Peijie Jin
2. Xiaoxuan Wang
3. Pan Tan
4. Wen Xiong
This article has no evaluationsLatest version Jun 25, 2025
Block Probabilistic Distance Clustering : A Unified Framework and Evaluation

This article has 2 authors:
1. Shrikrishna Bhat Kapu
2. Kiruthika C
This article has no evaluationsLatest version Jun 26, 2025

Listed in

Abstract

Article activity feed

Related articles

Improving K-Means Clustering: A Comparative Study of Parallelized Version of Modified K-Means Algorithm for Clustering of Satellite Images

Mining Spatial Co-location Patterns via γ-Quasi-Clique Detection

Block Probabilistic Distance Clustering : A Unified Framework and Evaluation