Development of a Hypertension Risk Prediction Model using Nationally Representative Survey Data: A Machine Learning Approach and Web Application Deployment

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Hypertension is a major modifiable risk factor for cardiovascular diseases. Early identification of high-risk individuals using predictive models can facilitate targeted interventions. This study aims to develop and validate a machine learning-based risk prediction model for hypertension using complex, nationally representative survey data and to deploy it as an accessible web application. Methods: We utilized data from the WHO Steps Survey Nepal, 2019 including 5,593 participants. The dataset featured clustering, stratification, and sampling weights, which were incorporated into the analysis. Fourteen initial predictors were considered. We employed a combination of SMOTENC and KMeansSMOTE to address class imbalance and optimized prediction thresholds for each model. Six machine learning algorithms (Logistic Regression, Naive Bayes, Random Forest, LightGBM, XGBoost, and SVM) were trained and evaluated based on AUC ROC, precision-recall, calibration, and clinical utility (Decision Curve Analysis). Model interpretability was assessed using SHAP values. The best model was deployed as an interactive web application using Streamlit. Results: The prevalence of hypertension in the weighted sample was 26.1%. After feature selection using SHAP analysis, seven key predictors were retained: age, smoking, waist-hip ratio risk, heavy alcohol use, physical activity, fasting blood sugar, and total cholesterol. Logistic Regression demonstrated the best overall performance (AUC ROC: 0.718, F1-Score: 0.552) and was well-calibrated. It also offered the highest net benefit across a wide range of clinical thresholds. The model was successfully deployed as a publicly available web application (https://htnrisknepal.streamlit.app/). Conclusions: We developed a robust, interpretable, and clinically useful hypertension risk prediction model. The deployment of the model as an open-access web application bridges the gap between research and practical implementation, enabling its use by healthcare workers and for public health screening initiatives. Our methodology provides a reliable framework for building and deploying predictive models from public health survey data.

Article activity feed