Bias-Variance Tradeoff Decomposition based Machine Learning Model Selection: Application to Credit Risk Analysis.
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In Machine Learning (ML), model selection is crucial since it consists of choosing the model to use in operations. The selected model is expected to exhibit optimal performance when run on unseen data. In this paper, we define a framework where we propose a new strategy for model selection based on the decomposition of the bias-variance tradeoff. The framework is defined by three dimensions: the model complexity, the learning set size, and the loss level. Our technique initially addresses the sample size dimension by constructing a learning convergence detection mechanism. After determining the optimal sample size, the ideal model complexity level is chosen by quantifying the bias-variance tradeoff. We used this method to find the optimal XGBoost model to address the credit risk classification problem. The study revealed that the learning process converges to a steady state at a certain size of the training set, where no significant reduction in the loss function can be seen. Furthermore, increasing the level of complexity of the model (maximum depth in this study) does not significantly improve the performance of the model. Moreover, the correlation and covariance of the bias and the variance at the optimal complexity level are the lowest among all other models. It's worth noting that the performance of the selected model was the best during the test phase.