Optimizing the Collection Process in Credit Risk Management: A Comparison of Machine Learning Techniques for Predicting Payment Probability at Different Stages of Arrears

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In credit risk, scoring models based on logistic regression have been developed to optimize the default risk assessment. However, these models require complex feature engineering and their accuracy worsens as the arrear progresses. This study proposes the use of machine learning techniques (XGBoost and Artificial Neural Networks) to generate scores in different arrear segments (No Arrears Segment, Segment 1-30 days of arrears, Segment 31-90 days of arrears, and All Segments). The Kolmogorov-Smirnov (KS) metric is used to assess the efficiency and predictive power of the models. To ensure the accuracy and reliability of the models, a five-step methodology is employed. It starts with the formulation of the problem, followed by the selection of a data sample and definition of the target variable, then a descriptive analysis of the data is performed to facilitate the data cleaning. Subsequently, the models are trained and tested, and finally, the results are analyzed and the models obtained are interpreted. The results show that both XGBoost and Artificial Neural Networks models outperform logistic regression in most of the arrears segments. In the No Arrears Segment, XGBoost model is the best with KS=63.36%. In the Segment 1-30, XGBoost is also the best with KS=51.38%. In the Segment 31-90, Artificial Neural Networks model is the best with KS=38.77%. Finally, with all segments of arrears, XGBoost model again is the best with KS=74.05%.

Article activity feed