GAN_BERT: An Advanced Neural Architecture for Effective Fraud Detection on Imbalanced Datasets
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In the financial sector, fraud detection tasks have posed a significant challenge to researchers for a long time, particularly in scenarios characterized by a highly imbalanced dataset. Due to the rare occurrence of fraudulent activities, unfortunately, significantly imbalanced datasets are common, leading to the limitations of traditional machine learning models to generalize well on minority classes. To address this challenge, we introduce GAN\_BERT, a hybrid neural framework architecture that combines Conditional Tabular Generative Adversarial Networks (CTGAN) for synthetic data generation with a transformer-based Bidirectional Encoder Representations from Transformers (BERT) classifier. Within GAN\_BERT, each component targets on different issues: the CTGAN module captures intrinsic patterns hidden behind fraud records, then generates high-quality synthetic samples for training. The data loader module prepares training data and synthetic samples in a stratified way, which substantially leverages up the model exposure for minority classes. Lastly, the classifier module learns the tempura relationship among fraud transactions, then identifies the fraud activities accurately while maintaining low false alarm rate. Running through the benchmark datasets with other state-of-art models, GAN\_BERT demonstrates noticeably improvements in precision, recall and F1-score for the minority class. We propose this innovative neural network architecture, GAN\_BERT, to be a robust, flexible, and scalable solution for fraud detection tasks especially on imbalanced datasets. Our research achievements may also be applicable to other domains facing similar challenges.