Comprehensive, Transparent, and Fair Machine Learning Models for Hypertension Risk Prediction: Benchmarking With Framingham, External Validation, Individual-Level Analysis, and Equitable Clinical Utility

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Hypertension (HTN) is a leading, yet often underdiagnosed, cause of cardiovascular diseases worldwide. While clinical risk scores like the Framingham Risk Score (FRS) are commonly used, their limited capacity to capture complex risk patterns and their poor generalizability hinder optimal prediction and prevention in resource-limited settings. Methods: We developed and rigorously validated a comprehensive suite of machine learning (ML) models, including tree-based, linear, kernel, neural network, and ensemble classifiers for incident HTN prediction using data from our internal cohort (n=8,054, Iran). All models were benchmarked against the FRS. External validation was conducted on the NHANES cohort (n=6,266, USA), employing harmonized features. We systematically assessed model performance across multiple metrics (ROC AUC, PR AUC, F1, and Brier), evaluated calibration, clinical decision benefit, and learning curves. We conducted extensive subgroup and fairness analyses by sex, education, and socioeconomic status. Interpretability was ensured via SHAP values and permutation importance. Additionally, individualized counterfactual analyses, bootstrap prediction intervals, and patient-level risk vignettes were provided. Results: ML models, especially our ensembles and LightGBM, significantly outperformed FRS in discrimination (mean ROC AUC improvement up to 0.04, 95% CI not crossing zero), with robust generalizability confirmed in external validation. All models demonstrated minimal performance disparities across demographic and socioeconomic subgroups, and SHAP analyses identified SBP, age, and BMI as consistently strong predictors. Individualized risk vignettes illustrated the clinical nuance of ML predictions compared to categorical clinical scores. Conclusion: This open, reproducible study demonstrates that modern, interpretable ML models provide significant and equitable improvements over standard clinical risk scores for HTN prediction, with strong external validity and clinical utility. An online risk calculator is provided to facilitate real-world deployment.

Article activity feed