A Robust Clustering Framework Combining Minimum Description Length and Genetic Optimization
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Clustering algorithms have been instrumental in advancing the field of data analysis, providing valuable techniques for organizing data into meaningful groups. However, individual clustering algorithms often suffer from inherent criteria limitations and biases, which prevent the development of a universal clustering method capable of delivering optimal solutions across diverse datasets. Addressing this challenge, we propose a novel clustering method that combines the Minimum Description Length (MDL) principle with a genetic optimization algorithm to overcome these limitations. The proposed method begins by generating an initial clustering solution using an ensemble clustering technique. This initial solution serves as a baseline, which is subsequently refined through evaluation functions grounded in the MDL principle and optimized using a genetic algorithm. By integrating the MDL principle, the proposed method not only incorporates external information from the input clusters but also adapts to the intrinsic properties of the dataset, thereby reducing the dependence of the final results on the input clusters. This adaptive approach ensures that the clustering process remains data-driven and robust. The effectiveness of the proposed method was evaluated using thirteen standard datasets, employing four widely recognized validation criteria: accuracy, normalized mutual information (NMI), Fisher score, and adjusted Rand index (ARI). Experimental results demonstrate that the proposed method consistently produces clusters with superior accuracy, greater stability, and reduced biases compared to traditional clustering methods. The results highlight the method's versatility, making it suitable for clustering a wide range of datasets with diverse characteristics. By leveraging the strengths of MDL and genetic optimization, this study presents a robust and adaptable clustering framework that advances the field of data clustering, offering a reliable tool for handling complex and varied datasets.