Comprehensive, Transparent, and Fair Machine Learning Models for Hypertension Risk Prediction: Benchmarking With Framingham, External Validation, Individual-Level Analysis, and Equitable Clinical Utility

Parsa Amirian
Mahsa Zarpoosh

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Hypertension (HTN) is a leading, yet often underdiagnosed, cause of cardiovascular diseases worldwide. While clinical risk scores like the Framingham Risk Score (FRS) are commonly used, their limited capacity to capture complex risk patterns and their poor generalizability hinder optimal prediction and prevention in resource-limited settings. Methods: We developed and rigorously validated a comprehensive suite of machine learning (ML) models, including tree-based, linear, kernel, neural network, and ensemble classifiers for incident HTN prediction using data from our internal cohort (n=8,054, Iran). All models were benchmarked against the FRS. External validation was conducted on the NHANES cohort (n=6,266, USA), employing harmonized features. We systematically assessed model performance across multiple metrics (ROC AUC, PR AUC, F1, and Brier), evaluated calibration, clinical decision benefit, and learning curves. We conducted extensive subgroup and fairness analyses by sex, education, and socioeconomic status. Interpretability was ensured via SHAP values and permutation importance. Additionally, individualized counterfactual analyses, bootstrap prediction intervals, and patient-level risk vignettes were provided. Results: ML models, especially our ensembles and LightGBM, significantly outperformed FRS in discrimination (mean ROC AUC improvement up to 0.04, 95% CI not crossing zero), with robust generalizability confirmed in external validation. All models demonstrated minimal performance disparities across demographic and socioeconomic subgroups, and SHAP analyses identified SBP, age, and BMI as consistently strong predictors. Individualized risk vignettes illustrated the clinical nuance of ML predictions compared to categorical clinical scores. Conclusion: This open, reproducible study demonstrates that modern, interpretable ML models provide significant and equitable improvements over standard clinical risk scores for HTN prediction, with strong external validity and clinical utility. An online risk calculator is provided to facilitate real-world deployment.

Version published to 10.1101/2025.09.03.25334979 on medRxiv
Sep 5, 2025

Cardiovascular risk scores for primary prevention: head-to-head validation of 16 established and contemporary models

This article has 9 authors:
1. Yuanning Hu
2. Shiqing Hu
3. Zilin Dong
4. Jiahui Wei
5. Zhongjie Zhang
6. Pengfei Jiang
7. Hao Huang
8. Tuo Li
9. Jian Zou
This article has no evaluationsLatest version Jul 6, 2026
Selective prediction as a triage gate for primary-care depression screening: quantifying and mitigating selection bias in CHARLS-2011

This article has 2 authors:
1. Zijian Wang
2. yaqing liu
This article has no evaluationsLatest version Jul 20, 2026
EXHEART: A Fairness-Aware Explainable Stacked Ensemble for Cardiovascular Disease Classification with Cross-Instrument Disparity Attribution

This article has 2 authors:
1. Md Anas Biswas
2. Alif Laila
This article has no evaluationsLatest version Jun 5, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Cardiovascular risk scores for primary prevention: head-to-head validation of 16 established and contemporary models

Selective prediction as a triage gate for primary-care depression screening: quantifying and mitigating selection bias in CHARLS-2011

EXHEART: A Fairness-Aware Explainable Stacked Ensemble for Cardiovascular Disease Classification with Cross-Instrument Disparity Attribution