Self-Supervised Learning for Financial Statement Fraud Detection with Limited and Imbalanced Data

Jianlin Lai
Anzhuo Xie
Hanrui Feng
Yi Wang
Ruoyi Fang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study addresses the challenges of scarce fraudulent samples, complex data distributions, and the limited adaptability of traditional methods in financial statement fraud detection by proposing a self-supervised learning algorithm. The approach first standardizes multidimensional financial indicators to mitigate scale differences, then employs an encoder to construct latent representations that capture high-order nonlinear relationships among indicators. A reconstruction task is introduced as an auxiliary signal, where a decoder approximates the input and minimizes reconstruction error to enhance the fidelity of representations. In parallel, a classification module distinguishes normal from fraudulent statements, with the model jointly optimizing reconstruction and classification losses to improve both feature completeness and discriminative ability. Experiments on a public financial fraud dataset show that the proposed method significantly outperforms existing baselines on Precision, Recall, F1-Score, and AUC, with particular strength in minority class recognition under imbalanced and limited data. Additional sensitivity experiments demonstrate that the method remains stable and robust across variations in optimizer type and imbalance ratios, confirming its effectiveness in complex financial environments. Overall, the algorithm provides an efficient and reliable pathway for fraud detection and exhibits distinctive advantages in accuracy and adaptability.

Version published to 10.20944/preprints202511.0452.v1
Nov 10, 2025

Comparative Performance of Deep Learning Models for Financial Statement Fraud Detection in an Imbalanced Classification Setting

This article has 2 authors:
1. Tsolmon Sodnomdavaa
2. Lkhamdulam Ganbat
This article has no evaluationsLatest version Jan 7, 2026
Mining Financial Data for Fraud Detection using Ensemble Learning and Outlier Detection

This article has 2 authors:
1. Manimegalai R
2. Vijayalaskhmi P
This article has no evaluationsLatest version Dec 10, 2025
Generative Distribution Modeling for Credit Card Risk Identification under Noisy and Imbalanced Transactions

This article has 6 authors:
1. Zhen Xu
2. Kewei Cao
3. Yihan Zheng
4. Mingfan Chang
5. Xinyi Liang
6. Jialu Xia
This article has no evaluationsLatest version Dec 25, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Comparative Performance of Deep Learning Models for Financial Statement Fraud Detection in an Imbalanced Classification Setting

Mining Financial Data for Fraud Detection using Ensemble Learning and Outlier Detection

Generative Distribution Modeling for Credit Card Risk Identification under Noisy and Imbalanced Transactions