Bias-Variance Tradeoff Decomposition based Machine Learning Model Selection: Application to Credit Risk Analysis.

Jaber Jemai
Ali Daud

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In Machine Learning (ML), model selection is crucial since it consists of choosing the model to use in operations. The selected model is expected to exhibit optimal performance when run on unseen data. In this paper, we define a framework where we propose a new strategy for model selection based on the decomposition of the bias-variance tradeoff. The framework is defined by three dimensions: the model complexity, the learning set size, and the loss level. Our technique initially addresses the sample size dimension by constructing a learning convergence detection mechanism. After determining the optimal sample size, the ideal model complexity level is chosen by quantifying the bias-variance tradeoff. We used this method to find the optimal XGBoost model to address the credit risk classification problem. The study revealed that the learning process converges to a steady state at a certain size of the training set, where no significant reduction in the loss function can be seen. Furthermore, increasing the level of complexity of the model (maximum depth in this study) does not significantly improve the performance of the model. Moreover, the correlation and covariance of the bias and the variance at the optimal complexity level are the lowest among all other models. It's worth noting that the performance of the selected model was the best during the test phase.

Version published to 10.21203/rs.3.rs-6442602/v1 on Research Square
May 5, 2025

Hyperparameter Optimization Strategies for Tree-Based Machine Learning Models Prediction: A Comparative Study of AdaBoost, Decision Trees, and Random Forest

This article has 1 author:
1. Mohsen Mohammadagha
This article has no evaluationsLatest version Apr 14, 2025
Block matrix incremental feature selection method based on fuzzy rough minimum classification error

This article has 4 authors:
1. First Zhanwei Chen
2. Second Minggang Xing
3. Third Xu Li
4. Fourth Juan Li
This article has no evaluationsLatest version Apr 29, 2025
Reinforcement Learning-Controlled Subspace Ensemble Sampling for Complex Data Structures

This article has 1 author:
1. jie liu
This article has no evaluationsLatest version May 13, 2025

Listed in

Abstract

Article activity feed

Related articles

Hyperparameter Optimization Strategies for Tree-Based Machine Learning Models Prediction: A Comparative Study of AdaBoost, Decision Trees, and Random Forest

Block matrix incremental feature selection method based on fuzzy rough minimum classification error

Reinforcement Learning-Controlled Subspace Ensemble Sampling for Complex Data Structures