Comparative Analysis of Supervised Learning Models for Detecting Credit Card and Bank Account Fraud

Shruti Chandna

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The purpose of this study is to investigate the efficacy of three supervised learning models, Logistic Regression, Random Forest and XGBoost, on two datasets of financial fraud detection that were constructed differently with differing class distributions. The Credit Card Fraud Detection Dataset (Kaggle, 2023) is a synthetic dataset that has been artificially balanced to produce a 50:50 relative proportion of fraudulent and non-fraudulent observations to allow for performance of the models to be evaluated under ideal conditions. On the other hand, the Bank Account Fraud Dataset (NeurIPS, 2022) reflects real-world monetary behavior and features extreme class imbalance characterized by only approximately 1% of the observations containing fraudulent behavior. (Jesus et al., 2022) A single pipeline was constructed using stratified 60 / 20 / 20 splits and SMOTE applied only to the training set, Evaluation metrics included F1-score and AUC-ROC. The results reflect close to perfect outcomes on the balanced synthetic dataset but large degradation in performance on the real-world imbalanced dataset. The model that consistently performed best on the imbalanced dataset was XGBoost as represented by the F1 (23.4%) and AUC (89.3%) values. These results are consistent with published benchmarks indicating that F1-scores in the 15 to 25% range represent excellent outcomes in practice in detection of fraudulent behavior. The results of the present study underscore the critical impact of data imbalance and real-world practicality of the dataset used in the performance of supervised models and indicate future study to apply techniques such as cost-sensitive learning, explainability and temporal modeling of financial data in operational settings in order to achieve generalization with the models tested.

Version published to 10.20944/preprints202510.1007.v1
Oct 15, 2025

Interpretable Ensemble Learning Models for Credit Card Fraud Detection

This article has 4 authors:
1. Saria Iqbal
2. Khalid Mehmood Awan
3. Shahid Kamal
4. Zahoor Ur Rehman
This article has no evaluationsLatest version Sep 15, 2025
Ensemble Methods and Emerging Paradigms in Credit Card Fraud Detection: A Comparative Study

This article has 3 authors:
1. Mariana López García
2. Carlos Alberto Ramírez Torres
3. José Hernández
This article has no evaluationsLatest version Sep 23, 2025
Comparing Ensemble Methods for Credit Card Fraud Detection: A Performance Analysis on Multiple Datasets

This article has 4 authors:
1. Juliana Rocha
2. Mariana Alves
3. Rafael Oliveira
4. Felipe Santos
This article has no evaluationsLatest version Sep 2, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Interpretable Ensemble Learning Models for Credit Card Fraud Detection

Ensemble Methods and Emerging Paradigms in Credit Card Fraud Detection: A Comparative Study

Comparing Ensemble Methods for Credit Card Fraud Detection: A Performance Analysis on Multiple Datasets