AOGMMNC: Adaptive and Robust General-Purpose Clustering for Data Partition
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Gaussian Mixture Model (GMM) combined with Spectral Clustering (SC) is an innovative clustering methodology that has been successfully applied to various clustering problems, particularly those involving complex shapes and nonlinear structures. However, some challenges arise in determining the optimal GMM and its relationship with SC: (1) Determining the optimal GMM can be costly, as it requires evaluating all models across a wide range of parameters; (2) The random initialization process of the Expectation-Maximization (EM) algorithm may result in unstable outcomes; (3) The efficiency of SC is highly reliant on the adjacency matrix generated by the GMM; (4) Clustering can become particularly difficult when the optimal GMM consists of only one mixture component. To tackle the challenges, we first implement a modified incremental GMM combined with the EM algorithm for determining the optimal number of mixtures in the GMM, allowing for adaptive fitting of the dataset. Next, we propose a novel initialization method for the EM algorithm called KGMC, which focuses on optimizing the GMM based on entropy-penalized maximum likelihood. Furthermore, we introduce a revised adjacency matrix (Α) and combine it with the fast algorithm for solving the normalized cut (FCD) to merge the optimal GMM for data partitioning. Additionally, the probability partition-based multi-cluster concept is proposed to address clustering tasks related to the optimal GMM with only one mixture. Rigorous comparisons with general and specialized clustering methods conducted on simulated and real-world datasets consistently demonstrate the high performance of our clustering algorithm across all tested datasets.