Financial Statement Fraud Detection Through an Integrated Machine Learning and Explainable AI Framework
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Financial statement fraud remains a substantial risk in environments marked by weak regulatory oversight and information asymmetry. This study develops a decision-centric framework that integrates machine learning, explainable artificial intelligence, and decision curve analysis to improve fraud detection under severe class imbalance. Using 969 firm-year observations from 132 Mongolian firms (2013–2024), we evaluate 21 financial ratios with models including Random Forest, XGBoost, LightGBM, MLP, TabNet, and a Stacking Ensemble trained with SMOTE and class-weighted learning. Performance was assessed using PR-AUC, F1-score, Recall, and DeLong-based significance testing. The Stacking Ensemble achieved the strongest results (PR-AUC = 0.93; F1 = 0.83), outperforming both classical and modern baseline models. Interpretability analyses (SHAP, LIME, and counterfactual explanations) consistently identified leverage, profitability, and liquidity indicators as dominant drivers of fraud risk, supported by a SHAP Stability Index of 0.87. Decision curve analysis showed that calibrated thresholds improved decision efficiency by 7–9% and reduced over-audit costs by 3–4%, while an audit cost simulation estimated annual savings of 80–100 million MNT. Overall, the proposed ML–XAI–DCA framework offers a transparent, interpretable, and cost-efficient approach for enhancing fraud detection in emerging-market contexts with limited textual disclosures.