High-Dimensional Multi-Source Feature Fusion for Early Default Prediction in Consumer Credit Portfolios

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study develops a multi-source feature-fusion framework that combines transaction histories, mobile-behavior data, credit-bureau information, and merchant-level attributes. The feature space contains over 4,800 engineered variables derived from 3.5 million customer records. A three-stage selection pipeline—correlation filtering, mutual-information ranking, and stability-selection LASSO—reduces dimensionality by 92%. The selected features train a LightGBM model optimized for early-stage (0–30 day) delinquency prediction. The model achieves an ROC-AUC of 0.91 and reduces false-negative early defaults by 37.5% compared with baseline logistic regression. Feature-importance patterns reveal strong interactions between merchant category instability and device-behavior anomalies. The results show the effectiveness of multi-source feature fusion for fine-grained default prediction.

Article activity feed