Application of Machine Learning Approaches to Develop Predictive Models for Diabetes and Hypertension among Bangladesh Adults

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Introduction With rapid urbanization, lifestyle changes, and an aging population, non-communicable diseases (NCDs), including hypertension and diabetes, pose significant public health challenges in Bangladesh and many other low- and middle-income countries. This study used machine learning (ML) approaches to develop predictive models for hypertension and diabetes among adults in this country. Methods We analyzed Bangladesh Demographic and Health Survey 2022 data. This is a nationally representative cross-sectional survey. Participants were classified as hypertensive when their systolic blood pressure was ≥140 mmHg, diastolic blood pressure was ≥90 mmHg, or if they used antihypertensive medication. They were classified as diabetic if their fasting plasma glucose was ≥7.0 mmol/L or they used glucose-lowering drugs. Potential predictors included age, gender, education, wealth quintile, overweight/obesity, rural-urban residence, and division of residence. Descriptive analysis was conducted, and six ML models were applied: artificial neural network (ANN), random forest, adaptive boosting (AdaBoost), gradient boosting, XGBoost, and support vector machine (SVM). Models’ performance was evaluated via accuracy, area under the curve (AUC), sensitivity, specificity, and F1-score. Feature importance was assessed to rank risk factors. Results The study included 13,847 adults, 55% of whom were females. Diabetes and hypertension had prevalence rates of 16.3% and 20.5%, respectively, with both conditions increasing with age, and the highest prevalence was 24.4% for diabetes and 43.3% for hypertension in individuals aged 65 and older. Wealthier and urban residents experienced higher rates (diabetes: 24.9% among the richest compared to 9.9% among the poorest; hypertension: 23.3% in urban versus 19.2% in rural areas). Additionally, overweight/obesity was a strong predictor for both conditions. For diabetes, AdaBoost had the highest AUC (0.699) and SVM had the highest accuracy (0.836); for hypertension, AdaBoost had the greatest AUC (0.775) and accuracy (0.799). Hypertension topped diabetes predictors, while overweight/obesity led for hypertension, followed by age and diabetes. Wealth and gender were moderately influential, with education and geographic factors less so. Low specificity across models indicated challenges in identifying non-cases. Conclusion In this ML-driven analysis, we identified the bidirectional relationship of hypertension and diabetes along with several other predictors, including overweight/obesity, older age, and richer household wealth quintiles. Our findings underscore the need for integrated screening and lifestyle interventions targeting high-risk groups to mitigate future NCD burden.

Article activity feed