Tax Evasion Prediction Using Financial Ratios and Machine Learning: A Hybrid Model Based on MLP, Naive Bayes, SVM, and Harmony Search Optimization

MAHA MOHEY Elweshihy
Samir Aboul Fotuoh Saleh

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study proposes a hybrid framework for predicting tax evasion based on financial ratios and advanced machine learning methods. The approach integrates feature selection and hyperparameter optimization using the Harmony Search (HS) algorithm with four classifiers: Logistic Regression (LR), Multilayer Perceptron (MLP), Support Vector Machine (SVM with RBF kernel), and Naive Bayes (NB). The dataset covers 180 companies from 2019–2021, where stratified 10-fold cross-validation repeated 8 times was applied, alongside a temporal split (2019–2020 for training, 2021 for testing) and an external hold-out sample (12 firms) to ensure generalization. Data preprocessing included treatment of missing values, outlier detection, and Z-score standardization. To enhance methodological robustness, additional techniques were employed, including SMOTE and class-weighting for imbalance handling, as well as statistical tests (McNemar and Friedman) to compare classifiers. Explainability was ensured through SHAP analysis, providing both global and local insights into variable contributions. Experimental results demonstrate that the hybrid pipeline achieves high accuracy, precision, and robustness across multiple evaluation metrics (CA, F1, MCC, and AUC), with the MLP model showing the most stable performance. These findings highlight the potential of hybrid intelligent models as reliable decision-support tools in tax auditing, while also acknowledging the limitations of small-sample contexts.

Version published to 10.21203/rs.3.rs-7490676/v1 on Research Square
Sep 3, 2025

Construction and analysis of data model for financial market volatility prediction based on support vector machine

This article has 1 author:
1. XiaoMeng Su
This article has no evaluationsLatest version Jan 21, 2026
Directional Forecasting of WTI and Brent Crude Oil Prices: A Machine Learning Approach with Technical Indicators at Daily, Weekly, and Monthly Frequencies

This article has 3 authors:
1. Badr Alnssyan
2. Muhammad Ali
3. Muhammad Ahmad
This article has no evaluationsLatest version Dec 16, 2025
Applying Multiple Linear Regression to Enhance Short-Term Stock Forecasting Accuracy

This article has 2 authors:
1. TOUSIF AL RASHID
2. Raj Kumar
This article has no evaluationsLatest version Dec 15, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Construction and analysis of data model for financial market volatility prediction based on support vector machine

Directional Forecasting of WTI and Brent Crude Oil Prices: A Machine Learning Approach with Technical Indicators at Daily, Weekly, and Monthly Frequencies

Applying Multiple Linear Regression to Enhance Short-Term Stock Forecasting Accuracy