Pilot Study of Hypertension Screening and Machine-Learning Prediction Using Community Outreach Data from Nkpokiti, Enugu, Nigeria Short Title: Machine Learning Prediction of Hypertension using Community Blood Pressure Data in Nigeria

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Hypertension is often undiagnosed in many low-income countries. While machine learning (ML) may enhance triage during community screening, there is, however, limited evidence from outreach programs across African settings. This pilot investigated the prevalence and feasibility of ML prediction using minimal data collected locally in a Nigerian outreach. Methods Cross-sectional analysis of anonymized data from a community outreach in Nkpokiti, Enugu, was performed. Eligible records (n = 115) included age, sex, and at least one paired systolic/diastolic blood-pressure (BP) measurement. Hypertension was also defined as mean SBP ≥ 140 mmHg and/or DBP ≥ 90 mmHg. Predictors were age, sex, first SBP, and pulse pressure. We trained penalized logistic regression (primary), random forest, and gradient-boosting models using nested 5-fold cross-validation for hyperparameter tuning; final illustrative results are reported on a stratified 80/20 hold-out. Discrimination (AUC), calibration (Brier score), and classification metrics were calculated with bootstrap confidence intervals. Results Median age was 33 years (mean 36.4, SD 14.4); 71.3% were female. The prevalence of hypertension was 25.2% (29/115), increasing with age from 12.7% (< 40y) to 52.0% (40–59y) and 54.5% (≥ 60y). SBP alone yielded an AUC of 0.865 (95% CI 0.777–0.941). On the hold-out set (n = 23; 6 positives), penalized logistic regression achieved an AUC of 0.941 (bootstrap mean 0.939; 95% CI 0.800–1.000), accuracy of 0.783, and Brier score of 0.094. Random forest: AUC 0.961, accuracy 0.826, Brier 0.088. A Gradient boosting method showed perfect discrimination on this small hold-out set (AUC 1.000) with a Brier score of 0.038, probably reflecting optimistic estimation due to small sample size. Conclusion In this pilot study, ML models comprising age, sex, and simple BP measures demonstrated excellent discrimination for hypertension, supporting the feasibility of context-specific, low-cost triage tools. These findings are exploratory; external validation with larger, representative African samples is needed before deployment.

Article activity feed