Financial Statement Fraud Detection Through an Integrated Machine Learning and Explainable AI Framework
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Financial statement fraud (FSF) is more prevalent in economies with high information asymmetry and weak institutional control, threatening investor trust and financial stability. This study proposes an integrated, decision-centric framework combining machine learning (ML), explainable artificial intelligence (XAI), and decision curve analysis (DCA) to improve detection under class-imbalanced conditions. Using financial statement data from 132 Mongolian companies (2013–2024; 969 firm-year observations and 21 ratios), Random Forest, XGBoost, LightGBM, and a Stacking Ensemble were implemented. Data imbalance was corrected with SMOTE and class weighting. Model performance was evaluated using PR-AUC, F1-score, and Recall, while interpretability was analyzed through SHAP, LIME, and counterfactual explanations. DCA and audit cost simulations were conducted to assess decision utility. The Stacking Ensemble achieved the best performance (PR-AUC = 0.93; F1 = 0.83). SHAP and LIME identified leverage and liquidity ratios as key predictors consistent with agency and signaling theories, with a SHAP Stability Index (SSI) of 0.87 confirming interpretability. DCA results indicated a 7–9% improvement in decision efficiency, 3–4% lower audit costs, and annual savings of MNT 80–100 million. The study introduces a transparent, cost-efficient framework integrating XAI, DCA, and audit cost simulation for optimized FSF detection and data-driven financial supervision.