Financial Statement Fraud Detection Through an Integrated Machine Learning and Explainable AI Framework

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Financial statement fraud remains a substantial risk in environments marked by weak regulatory oversight and information asymmetry. This study develops a decision-centric framework that integrates machine learning, explainable artificial intelligence, and decision curve analysis to improve fraud detection under severe class imbalance. Using 969 firm-year observations from 132 Mongolian firms (2013–2024), we evaluate 21 financial ratios with models including Random Forest, XGBoost, LightGBM, MLP, TabNet, and a Stacking Ensemble trained with SMOTE and class-weighted learning. Performance was assessed using PR-AUC, F1-score, Recall, and DeLong-based significance testing. The Stacking Ensemble achieved the strongest results (PR-AUC = 0.93; F1 = 0.83), outperforming both classical and modern baseline models. Interpretability analyses (SHAP, LIME, and counterfactual explanations) consistently identified leverage, profitability, and liquidity indicators as dominant drivers of fraud risk, supported by a SHAP Stability Index of 0.87. Decision curve analysis showed that calibrated thresholds improved decision efficiency by 7–9% and reduced over-audit costs by 3–4%, while an audit cost simulation estimated annual savings of 80–100 million MNT. Overall, the proposed ML–XAI–DCA framework offers a transparent, interpretable, and cost-efficient approach for enhancing fraud detection in emerging-market contexts with limited textual disclosures.

Article activity feed