A Two-Stage Machine Learning Approach to Bankruptcy Prediction: Integrating Full-Feature Modeling and Optimized Feature Selection

Masanobu Matsumaru
Hideki Katagiri

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Corporate bankruptcy prediction has become increasingly critical amid economic uncertainty. This study proposes a novel two-stage machine learning approach to enhance bankruptcy prediction accuracy, applied to Tokyo Stock Exchange-listed companies. First, models were trained using 173 financial indicators. Second, a wrapper-based feature selection process was employed to reduce dimensionality and eliminate noise, thereby identifying an optimal seven-feature set. Two ensemble learning methods, Random Forest and Light Gradient Boosting Machine (LightGBM), were used. Random Forest correctly predicted 566 bankruptcies using the reduced feature set (88 more than when using all features) compared with 451 by LightGBM (31 more than when using all features). LightGBM is a gradient boosting–based ensemble learning method that employs a leaf-wise tree growth strategy, enabling fast computation and high predictive accuracy, especially in large-scale and high-dimensional datasets. The study also addresses challenges posed by imbalanced data by employing resampling techniques (SMOTE, SMOTE-ENN, and KMeans). Additionally, the need for industry-specific modeling is recognized by constructing models for the six industry sectors. These findings highlight the importance of feature selection and ensemble learning for improving model generalizability and uncovering industry-specific patterns. This study contributes to the field of bankruptcy prediction by providing a robust framework for accurate and interpretable predictions for both academic research and practical applications. Future work will focus on further enhancing prediction accuracy to identify more potential bankruptcies.

Version published to 10.3390/jrfm18120662
Nov 22, 2025
Version published to 10.20944/preprints202510.2374.v1
Oct 30, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed