Enhancing Credit Card Fraud Detection Using DBSCAN-Augmented Disjunctive Voting Ensemble
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Credit card fraud detection remains a critical yet challenging task due to the extreme class imbalance inherent in transaction datasets, where fraudulent activities constitute only a small fraction of the total records. To address this imbalance and enhance the detection of rare fraud instances, this study proposes a novel hybrid framework that integrates density-based clustering for data augmentation with an ensemble classification strategy optimized for high recall. In the preprocessing stage, the proposed method utilizes Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to identify minority-class clusters and synthetically augment the fraud class. This step aims to preserve the intrinsic structure of fraudulent patterns while increasing their representation in the training set. Subsequently, an ensemble model comprising Random Forest (RF), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM) classifiers is constructed. Final predictions are generated using a Disjunctive Voting Ensemble (DVE) strategy, wherein a transaction is classified as fraudulent if any of the base classifiers predicts it as such. This permissive voting mechanism prioritizes recall, thereby minimizing the risk of undetected fraudulent transactions. Extensive experiments conducted on three publicly available credit card fraud imbalanced datasets containing transaction records from European credit cardholders in 2023, providing a realistic fraud detection scenario. Computational efficiency in both training and testing is achieved using the Anaconda Navigator (Spider-Python 3.12) environment. The comparative analysis shows that the proposed DBSCAN-augmented DVE framework delivers notable improvements over traditional ensemble approaches and single-model baselines, particularly in recall and F1-score, while preserving consistently high precision. This approach proves to be both robust and interpretable, making it highly applicable to real-world fraud detection scenarios characterized by severe class imbalance. The results are especially compelling, achieving recall and F1-scores as high as 99.5% and 99.8% respectively, and consistently maintaining a perfect accuracy and precision of 100% across all imbalanced datasets. This study highlights the effectiveness of hybrid ensemble approaches in combating credit card fraud. The findings lay the groundwork for developing more resilient and adaptive fraud detection systems, which are crucial in countering the evolving tactics of fraudsters. The proposed model marks a significant advancement in securing financial transactions and mitigating risks in an increasingly digital economy.