A Comprehensive Survey on Clustering Algorithms: Concepts, Taxonomy with Nature-Inspired Meta-Heuristic Approaches and Performance Metrics

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Clustering is a core technique in unsupervised learning that organizes unlabeled data into meaningful groups based on similarity. It has wide applications in domains such as bioinformatics, pattern recognition, social network analysis, computer vision, and artificial intelligence. Owing to the diversity and complexity of real-world datasets, numerous clustering paradigms have been developed, each with specific advantages and limitations. This survey provides a structured overview of classical and advanced clustering approaches, including hierarchical, partition-based, density-based, model-based, subspace, grid-based, and search-based metaheuristic techniques. We further examine commonly used similarity measures and validation metrics, including internal and external evaluation criteria, to highlight their role in assessing clustering quality. A comparative taxonomy is presented to clarify algorithmic characteristics, scalability, robustness, and parameter sensitivity under varying data conditions. Despite significant progress, challenges remain in handling noisy and high-dimensional data, determining the optimal number of clusters, and ensuring computational efficiency. Emerging directions such as hybrid frameworks, self-supervised learning, and multi-view clustering offer promising avenues for developing more adaptive and scalable clustering solutions.

Article activity feed