Detecting financial misstatements in emerging markets: a machine learning approach

Hoa Thi Thanh Tieu
Thanh Hien Hoang
Hung Ngoc Tran

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study develops a machine learning–based framework for detecting material misstatements in the financial statements of Vietnamese listed companies. Using 10,286 firm-year observations from 2016–2023, the research applies two ensemble algorithms, Random Forest (RF) and Extreme Gradient Boosting (XGBoost), to a binary classification task based on audit-adjusted profit discrepancies. To address data imbalance and improve prediction reliability, the Synthetic Minority Over-sampling Technique (SMOTE) is applied within a stratified cross-validation procedure, while Bayesian optimization tunes hyperparameters to enhance generalization performance. Both RF and XGBoost achieved high predictive accuracy (~ 0.839) and strong discriminative power (AUC-ROC ~ 0.91), outperforming logistic regression. Model interpretability was improved through the Least Absolute Shrinkage and Selection Operator (LASSO), which selected key financial and non-financial predictors from over 50 variables. RF’s feature importance analysis further highlighted the influence of listing exchange characteristics, prior misstatement history, and forward-looking performance indicators. The proposed framework offers auditors and regulators a scalable, data-driven tool for risk-based audit planning and regulatory oversight—particularly valuable in emerging markets with limited confirmed fraud data.

Version published to 10.21203/rs.3.rs-7360630/v1 on Research Square
Oct 1, 2025

Forecasting Crude Oil Prices: Insights from Machine Learning Approaches

This article has 1 author:
1. Haseen
This article has no evaluationsLatest version Dec 16, 2025
A Unified Machine Learning Framework for Enterprise Portfolio Forecasting, Risk Detection, and Automated Reporting

This article has 1 author:
1. Ashutosh Agarwal
This article has no evaluationsLatest version Dec 10, 2025
Deep Learning Architectures for Credit Risk Assessment

This article has 1 author:
1. Balogun David Taiwo
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Forecasting Crude Oil Prices: Insights from Machine Learning Approaches

A Unified Machine Learning Framework for Enterprise Portfolio Forecasting, Risk Detection, and Automated Reporting

Deep Learning Architectures for Credit Risk Assessment