Tax Evasion Prediction Using Financial Ratios and Machine Learning: A Hybrid Model Based on MLP, Naive Bayes, SVM, and Harmony Search Optimization

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study proposes a hybrid framework for predicting tax evasion based on financial ratios and advanced machine learning methods. The approach integrates feature selection and hyperparameter optimization using the Harmony Search (HS) algorithm with four classifiers: Logistic Regression (LR), Multilayer Perceptron (MLP), Support Vector Machine (SVM with RBF kernel), and Naive Bayes (NB). The dataset covers 180 companies from 2019–2021, where stratified 10-fold cross-validation repeated 8 times was applied, alongside a temporal split (2019–2020 for training, 2021 for testing) and an external hold-out sample (12 firms) to ensure generalization. Data preprocessing included treatment of missing values, outlier detection, and Z-score standardization. To enhance methodological robustness, additional techniques were employed, including SMOTE and class-weighting for imbalance handling, as well as statistical tests (McNemar and Friedman) to compare classifiers. Explainability was ensured through SHAP analysis, providing both global and local insights into variable contributions. Experimental results demonstrate that the hybrid pipeline achieves high accuracy, precision, and robustness across multiple evaluation metrics (CA, F1, MCC, and AUC), with the MLP model showing the most stable performance. These findings highlight the potential of hybrid intelligent models as reliable decision-support tools in tax auditing, while also acknowledging the limitations of small-sample contexts.

Article activity feed