Enhanced Predictive Modeling for Financial Risk Assessment using Hybrid AI ( ML & DL) on Structured and Unstructured Data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Financial institutions are finding it more challenging to accurately assess credit risk, particularly when dealing with a variety of data sources that include both unstructured textual data and structured financial records. In order to analyse financial risk utilising both structured and unstructured data, this research suggests an improved predictive modelling framework that combines conventional Machine Learning (ML) and Deep Learning (DL) techniques. The structured dataset comes from the Home Credit Default Risk dataset (307,511 applicant records, 122 characteristics). To give additional predictive capabilities, unstructured data from relevant news and social media sources is interpreted using Natural Language Processing (NLP) methods including sentiment analysis. A standard preprocessing pipeline, including SMOTEENN, label encoding, feature scaling, and missing value imputation, is utilised to resolve class imbalance in the fused dateset. While the ML pipeline employs XGBoost, LightGBM, and Random Forest in a Stacking ensemble, the DL pipeline utilises a CNN-LSTM architecture to capture both local patterns and temporal correlations. Model evaluation using Accuracy, Precision, Recall, F1-score, ROC-AUC, and confusion matrix analysis shows that the addition of unstructured variables significantly improves predictive performance over just structured baselines. While the Stacking Classifier has the best ROC-AUC of 0.9816 and the highest F1-score of 0.95, the CNN-LSTM has the highest ROC-AUC of 0.9592 and an F1-score of 0.91. The proposed hybrid approach emphasizes the advantages of combining structured and unstructured data for accurate and scalable credit risk assessment in modern financial systems.