Enhancing credit card fraud detection using DBSCAN-augmented disjunctive voting ensemble

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Credit card fraud detection remains a critical yet challenging task due to the extreme class imbalance inherent in transaction datasets, where fraudulent activities constitute only a small fraction of the total records. To address this imbalance and enhance the detection of rare fraud instances, this study proposes a novel hybrid framework that integrates density-based clustering for data augmentation with an ensemble classification strategy optimized for high recall. In the preprocessing stage, the framework employs density-based spatial clustering of applications with noise (DBSCAN) to identify minority-class clusters and synthetically augment the fraud class. This preserves the intrinsic structure of fraudulent patterns while increasing their representation in the training set. Subsequently, an ensemble model comprising random forest (RF), K-nearest neighbors (KNN), and support vector machine (SVM) is constructed, with final predictions generated using a disjunctive voting ensemble (DVE) strategy. In this scheme, a transaction is labeled fraudulent if any of the base classifiers predicts it as such, a permissive approach that prioritizes recall and minimizes the risk of undetected fraud. Extensive experiments were conducted on three publicly available credit card fraud datasets containing transaction records from European cardholders in 2023, providing a realistic evaluation scenario. Implemented in the Anaconda Navigator (Spyder-Python 3.12) environment, the framework achieved both computational efficiency and robust performance. The findings demonstrate that DBSCAN-based augmentation effectively enhances minority-class representation while preserving fraud patterns, and the DVE strategy ensures high recall by substantially reducing false negatives. Comparative analysis confirms that the framework significantly outperforms traditional ensembles and single classifiers, achieving recall up to 99.5%, F1-scores up to 99.8%, and consistently maintaining 100% accuracy and precision. Overall, the study highlights the robustness, scalability, and interpretability of the proposed model, marking a significant advancement in developing adaptive fraud detection systems for real-world financial transactions.

Article activity feed