Confidence-Aware Pseudo-Labeling via Unsupervised Ensemble Consensus for Fraud Detection Contribution
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
It is very difficult to detect fraud in financial transactions because of lack of labeled dataset, because of this traditional supervised methodology is very hard to implement. In order to overcome this problem, this study introduce a hybrid approach without the use of seed label by generated what is known as peeudo-labels with the help of unsupervised ensemble consensus of four anomaly detection models. One-Class Support Vector Machine (SVM), Isolation Forest, DBSCAN and Autoencoder. With these transactions are labeled using majority voting, with agreement scores which provide a confidence-aware hierarchy for prioritizing investigation. The pseudo-label generated from these anomaly detection models is then trained on supervised stack ensemble which is made up of XGBoost, Random Forest, SVM, 1D CNN and LSTM using Logistic Regression as the meta-learner. 2,512 bank transactional dataset results indicate that the that unsupervised ensemble recognized 181 anomalies (7.2%) with the stacked ensemble achieving a performance of 98.7% accuracy, 98.3% precision, 99.1% recall and F1 score of 98.7%. These findings illustrate that fraud detection can reliably be achieved without seed labels, using interpretable pseudo-labels to bridge the gap between unsupervised anomaly and supervised learning.