Self-Supervised Learning for Financial Statement Fraud Detection with Limited and Imbalanced Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study addresses the challenges of scarce fraudulent samples, complex data distributions, and the limited adaptability of traditional methods in financial statement fraud detection by proposing a self-supervised learning algorithm. The approach first standardizes multidimensional financial indicators to mitigate scale differences, then employs an encoder to construct latent representations that capture high-order nonlinear relationships among indicators. A reconstruction task is introduced as an auxiliary signal, where a decoder approximates the input and minimizes reconstruction error to enhance the fidelity of representations. In parallel, a classification module distinguishes normal from fraudulent statements, with the model jointly optimizing reconstruction and classification losses to improve both feature completeness and discriminative ability. Experiments on a public financial fraud dataset show that the proposed method significantly outperforms existing baselines on Precision, Recall, F1-Score, and AUC, with particular strength in minority class recognition under imbalanced and limited data. Additional sensitivity experiments demonstrate that the method remains stable and robust across variations in optimizer type and imbalance ratios, confirming its effectiveness in complex financial environments. Overall, the algorithm provides an efficient and reliable pathway for fraud detection and exhibits distinctive advantages in accuracy and adaptability.

Article activity feed