DPDO:Dynamic Possion Disk Oversampling based on minority clusters within circular region for class imbalance problem
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In classification tasks, the number of samples in different classes may differ significantly, a phenomenon known as the class imbalance problem. A common approach to address this issue is the Synthetic Minority Oversampling Technique (SMOTE). It works by changing the data distribution instead of the classifier. However, many SMOTE-based improved methods still rely solely on local linear interpolation between a sample and its nearest neighbors, neglecting the impact of the global neighborhood structure on the sample synthesis process, thereby limiting the diversity and authenticity of the generated samples.To overcome this limitation, inspired by the concept of Poisson Disk Sampling, we proposes a Dynamic Poisson Disk Oversampling algorithm (DPDO) combining global neighborhood awareness with density-adaptive sampling control.First,DPDO removes noise samples by calculating the total distance of each sample to its K nearest neighbors and identifies clusters of points with similar features. Then, within these clustered points, it dynamically expands the oversampling region based on their structural constraints,generating new samples that adequately reflect the features of the minority class. Extensive experiments on 27 benchmark datasets, comparing DPDO with ten representative oversampling baselines, show that DPDO achieves superior performance in terms of F1-score and G-mean. The results indicate that DPDO can effectively alleviate class imbalance and produces more realistic, uniform, and diverse minority samples, by incorporating global neighborhood information.