Consensus-Driven Feature Selection for Transparent and Robust Loan Default Prediction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate loan default prediction is essential for financial stability and inclusion, yet remains challenging due to high-dimensional, imbalanced, and heterogeneous borrower data. Traditional feature selection methods often suffer from redundancy, dominance, and instability, resulting in suboptimal and less interpretable models. To address these challenges, we propose a Hybrid Rank-Aggregated Feature Selection (HRA-FS) framework that integrates ReliefF, Recursive Feature Elimination, and ElasticNet through Borda count aggregation. Our study incorporates strategic feature categorization to mitigate domain dominance, ensuring balanced representation across diverse risk drivers and thereby enhancing interpretability and operational trust. Evaluated on real-world imbalanced datasets of 2044 Chinese farmers and 3045 small firms, using XGBoost, HRA-FS consistently outperforms all single FS methods, achieving a ROC-AUC of 0.965 for firms. The method identifies compact, predictive feature sets, including critical attributes such as house value and inventory turnover rate. Our findings demonstrate that this consensus-driven approach resolves the trilemma of accuracy, stability, and interpretability, offering lenders robust tools for equitable credit assessment and fostering inclusive financial ecosystems.