A Two-Stage Machine Learning Approach to Bankruptcy Prediction: From Comprehensive Modeling to Feature Selection for Noise Reduction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Corporate bankruptcy prediction has become increasingly critical amid economic uncertainty. This study proposes a novel two-stage machine learning approach to enhance bankruptcy prediction accuracy, applied to Tokyo Stock Exchange-listed companies. First, models were trained using 173 financial indicators. Second, a wrapper-based feature selection process was employed to reduce dimensionality and eliminate noise, thereby identifying an optimal seven-feature set. Two ensemble learning methods, Random Forest and LightGBM, were used. Random Forest correctly predicted 566 bankruptcies using the reduced feature set (88 more than when using all features) compared with 451 by LightGBM (31 more than when using all features). The study also addresses challenges posed by imbalanced data by employing resampling techniques (SMOTE, SMOTE-ENN, and KMeans). Additionally, the need for industry-specific modeling is recognized by constructing models for the six industry sectors. These findings highlight the importance of feature selection and ensemble learning for improving model generalizability and uncovering industry-specific patterns. This study contributes to the field of bankruptcy prediction by providing a robust framework for accurate and interpretable predictions for both academic research and practical applications. Future work will focus on further enhancing prediction accuracy to identify more potential bankruptcies.