Consensus-Driven Feature Selection for Transparent and Robust Loan Default Prediction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate loan default prediction is essential for financial stability and inclusion, yet remains challenging due to high-dimensional, imbalanced, and heterogeneous borrower data. Traditional feature selection methods often suffer from redundancy, dominance, and instability, resulting in suboptimal and less interpretable models. To address these challenges, we propose a Hybrid Rank-Aggregated Feature Selection (HRA-FS) framework that integrates ReliefF, Recursive Feature Elimination, and ElasticNet through Borda count aggregation. Our study incorporates strategic feature categorization to mitigate domain dominance, ensuring balanced representation across diverse risk drivers and thereby enhancing interpretability and operational trust. Evaluated on real-world imbalanced datasets of 2044 Chinese farmers and 3045 small firms, using XGBoost, HRA-FS consistently outperforms all single FS methods, achieving a ROC-AUC of 0.965 for firms. The method identifies compact, predictive feature sets, including critical attributes such as house value and inventory turnover rate. Our findings demonstrate that this consensus-driven approach resolves the trilemma of accuracy, stability, and interpretability, offering lenders robust tools for equitable credit assessment and fostering inclusive financial ecosystems.

Article activity feed