Development of a Hypertension Risk Prediction Model using Nationally Representative Survey Data: A Machine Learning Approach and Web Application Deployment

Sandip Pandey
Asmit Pandey
Aakash Neupane
Deepak Subedi
Aashish Guragain

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Hypertension is a major modifiable risk factor for cardiovascular diseases. Early identification of high-risk individuals using predictive models can facilitate targeted interventions. This study aims to develop and validate a machine learning-based risk prediction model for hypertension using complex, nationally representative survey data and to deploy it as an accessible web application.

Methods

We utilized data from the WHO Steps Survey Nepal, 2019 including 5,593 participants. The dataset featured clustering, stratification, and sampling weights, which were incorporated into the analysis. Fourteen initial predictors were considered. We employed a combination of SMOTENC and KMeansSMOTE to address class imbalance and optimized prediction thresholds for each model. Six machine learning algorithms (Logistic Regression, Naive Bayes, Random Forest, LightGBM, XGBoost, and SVM) were trained and evaluated based on AUC ROC, precision-recall, calibration, and clinical utility (Decision Curve Analysis). Model interpretability was assessed using SHAP values. The best model was deployed as an interactive web application using Streamlit.

Results

The prevalence of hypertension in the weighted sample was 26.1%. After feature selection using SHAP analysis, seven key predictors were retained: age, smoking, waist-hip ratio risk, heavy alcohol use, physical activity, fasting blood sugar, and total cholesterol. Logistic Regression demonstrated the best overall performance (AUC ROC: 0.718, F1-Score: 0.552) and was well-calibrated. It also offered the highest net benefit across a wide range of clinical thresholds. The model was successfully deployed as a publicly available web application ( https://htnrisknepal.streamlit.app/ ).

Conclusions

We developed a robust, interpretable, and clinically useful hypertension risk prediction model. The deployment of the model as an open-access web application bridges the gap between research and practical implementation, enabling its use by healthcare workers and for public health screening initiatives. Our methodology provides a reliable framework for building and deploying predictive models from public health survey data.

Version published to 10.1101/2025.08.30.25334758 on medRxiv
Sep 3, 2025

Machine Learning Insights for Cardiovascular Risk Prediction in Diabetic Patients: Emphasis on Renal and Cardiac Markers Using Random Forests

This article has 1 author:
1. Julian Borges
This article has no evaluationsLatest version Jan 21, 2026
Development and internal validation of a machine learning–based prediction model and simplified screening score for in-hospital falls: a retrospective cohort study

This article has 9 authors:
1. Onishi Tatsuki
2. Tatsuyoshi Ikenoue
3. Norihide Itoh
4. Takumi Nishioka
5. Keima Nagasaka
6. Ryo Okochi
7. Haru Adachi
8. Naoko Matsuo
9. Yoshiya Ueno
This article has no evaluationsLatest version Jan 23, 2026
Development and Validation of a Machine Learning-Based Risk Prediction Model for Ischemic Stroke-Diabetes Comorbidity

This article has 2 authors:
1. Litian Hu
2. Hongyu Sun
This article has no evaluationsLatest version Dec 23, 2025

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

Related articles

Machine Learning Insights for Cardiovascular Risk Prediction in Diabetic Patients: Emphasis on Renal and Cardiac Markers Using Random Forests

Development and internal validation of a machine learning–based prediction model and simplified screening score for in-hospital falls: a retrospective cohort study

Development and Validation of a Machine Learning-Based Risk Prediction Model for Ischemic Stroke-Diabetes Comorbidity