Stacked Ensemble Models for SME Credit Risk Assessment: Integrating Data Balancing and Feature Selection Techniques

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Small and Medium Enterprises (SMEs) often face challenges in obtaining credit due to their per-ceived high credit risk. This study aims to develop a deep learning model using the stacking ensemble technique to enhance the accuracy of credit risk assessment for SMEs. The research utilizes a dataset from the Ministry of Industry, consisting of 14 quantitative and qualitative variables. Due to data imbalance, four data balancing techniques are applied: Synthetic Minority Over-sampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), a combination of SMOTE and Edited Nearest Neighbors (SMOTEENN), and a combination of SMOTE and Tomek Links (SMOTETomek). The study compares the performance of nine machine learning models: Decision Tree, Support Vector Machines (SVM), Gradient Boosting, K-Nearest Neighbors (KNN), Naïve Bayes, Logistic Regression with Meta-Learning, Gradient Boosting with Meta-Learning, Extreme Gradient Boosting with Meta-Learning, and Multi-layer Perceptron Neural Network with Meta-Learning. Model performance is evaluated based on Accuracy, Precision, Recall, F1-score, and Area Under the ROC Curve (AUCs). Results indicate that the stacking ensemble technique, particularly the Multi-layer Perceptron Neural Network with Meta-Learning, achieves the highest performance, with an F1-score of 0.953 and an AUCs of 0.990. Logistic Regression with Meta-Learning and Gradient Boosting with Meta-Learning also yield strong results, outper-forming baseline models such as standard Gradient Boosting. Furthermore, applying Stepwise Feature Selection reduces the number of variables without compromising model performance. Overall, the combination of stacking ensemble models, data balancing techniques, and optimal feature selection significantly enhances the accuracy of credit risk assessment for SMEs.

Article activity feed