FWB-SMOTE: Feature-Weighted Borderline Synthetic Minority Oversampling Technique for Class Imbalance Problems
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
SMOTE is a classic method for handling imbalanced datasets, but it has issues such as introducing noisy samples, blurring boundaries, and equalizing feature weights when generating minority class samples. To improve the quality of oversampling, this paper proposes the FWB-SMOTE (Feature-Weighted Borderline-SMOTE) algorithm. The algorithm first calculates the Euclidean distance matrix between minority class samples and the entire dataset, and filters out noise by combining the proportion of neighboring samples' categories. It then uses the feature splitting gain of XGBoost to quantify feature importance, normalizes the weights and maps them to feature dimensions, making key features dominate in sample generation and suppressing interference from redundant features. Finally, the k-nearest neighbor algorithm is used to determine the minority class boundary samples in the weighted feature space, and the selected boundary samples are oversampled by integrating feature weights. Comparative experiments on 44 groups of KEEL datasets verify that FWB-SMOTE algorithm significantly outperforms 8 comparison algorithms including SMOTE and Borderline-SMOTE in terms of AUC and G-mean indicators on DT, SVM, and KNN classifiers, confirming the effectiveness of this algorithm in improving the quality of minority class sample generation.