Fuzzy Feature Selection Using Fuzzy C-Means Clustering and Recursive Feature Elimination (FCM-RFE)
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In machine learning, feature selection is crucial for reducing computing costs, increasing generalization, reducing dimensionality, and improving model interpretability. Due to multicollinearity and redundancy, traditional approaches often encounter difficulties when dealing with high-dimensional data. We propose a hybrid framework called Fuzzy Feature Selection using Fuzzy C-Means Clustering and Recursive Feature Elimination (FCM-RFE), which combines fuzzy logic, filter, and wrapper approaches, to address these problems. In order to capture complex relationships, fuzzy C-Means clustering first partitions related features into soft clusters. Then, within each cluster, less significant features are repeatedly eliminated using Recursive Feature Elimination with Random Forest (RFE-RF). For more precise selection, features are ranked according to the strength of their cluster link using a fuzzy membership-based scoring system. Experiments on 18 benchmark datasets using KNN and SVM classifiers evaluated metrics including accuracy, precision, recall, F1-score, specificity, and AUC-ROC. The proposed approach maintained or enhanced performance while significantly decreasing dimensionality, selecting, on average, only 4.1% of the original features. The maximum accuracy was 92.75% for SVM with FCM-RFE and 89% for KNN. The proposed method demonstrated effectiveness and scalability for high-dimensional data analysis, outperforming eight state-of-the-art techniques and demonstrating computing efficiency. This framework is suitable for high-dimensional data analysis in various disciplines because it not only increases classification performance but also improves interpretability and scalability.