Fraud Detection in Online Transactions: Toward Hybrid Supervised–Unsupervised Learning Pipelines
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Fraud detection in online transactions presents a challenging task due to the rarity of fraudulent events and the evolving nature of fraud strategies. This study presents a comparative analysis of three supervised machine learning models, Logistic Regression, Random Forest, and LightGBM, for detecting fraudulent transactions in an extremely imbalanced dataset. We evaluate each model under both standardized and raw feature preprocessing settings using macro-averaged metrics and AUC. Our findings show that ensemble-based models, particularly LightGBM, significantly outperform linear baselines and exhibit robustness to feature scaling. Additionally, we assess K-Means clustering as an unsupervised baseline, but observe that it fails to meaningfully separate fraud cases, suggesting the need for more informative features or hybrid learning approaches. These results offer practical insights into model selection, preprocessing, and the trade-offs between precision and recall in real-world fraud detection systems.