Explainable Machine Learning Models for Predicting Health-Related Quality of Life in High-Risk Cardiovascular Populations: A Comparative Analysis of SF-12 Data and Clinical Risk Stratification
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Cardiovascular diseases (CVDs) continue to pose a substantial burden on global health. While traditional clinical metrics focus on physiological outcomes, they often overlook the multidimensional nature of health-related quality of life (HRQoL). Moreover, conventional regression models struggle to capture non-linear patterns in HRQoL data. In this context, machine learning (ML) offers a promising alternative for predictive modelling and individualized risk assessment. Methods This study employed a cross-sectional study design involving 8,857 high-risk CVD individuals in Nanjing, China. HRQoL was measured using the 12-item Short Form Health Survey Version 2(SF12v2), yielding physical (PCS) and Mental Component Summary (MCS) scores. Five ML models—SVM, LightGBM, XGBoost, Random Forest and Logistic Regression—were developed following hybrid feature selection (Random Forest-Recursive Feature Elimination). Model performance was evaluated using standard metrics such as AUC, F1-score, accuracy, sensitivity, and specificity. SHAP analysis was used to analyse predictor contributions. Results The SVM model achieved the best performance in classifying PCS outcomes (AUC = 0.632), while LightGBM achieved the most balanced classification for MCS (AUC = 0.571) in terms of sensitivity (0.834) and specificity (0.238). Key predictors of HRQoL included physical activity (MET), occupation, and tea consumption. SHAP analysis revealed that individuals with MET ≥ 8,000 min/week were 14.5% more likely to attain high PCS scores, while daily tea consumption reduced psychological distress risk by 19% in MCS. Conclusion ML models, particularly SVM and LightGBM, effectively predicted HRQoL in high-risk CVD populations, with MET, occupation, and lifestyle factors emerging as actionable intervention targets. SHAP interpretability strengthens clinical applicability, enabling personalised strategies for at-risk subgroups. These findings support the inclusion of ML-based HRQoL predictions in digital health frameworks for proactive, patient-tailored cardiovascular care.