Prediction of incomplete vaccination among children aged 12-35 months in sub-Saharan Africa: Application of machine learning algorithm using recent (2021-2024) Demographic and Health Survey Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Despite immunization preventing millions of deaths worldwide, sub‑Saharan Africa continues to face a heavy burden of vaccine‑preventable diseases, with nearly one in five children missing essential doses such as the third diphtheria, tetanus, and pertussis vaccine. Routine programs remain constrained by maternal, household, and contextual barriers, leaving the region far from achieving IA2030 targets. Earlier studies relied mainly on descriptive or regression‑based analyses using older datasets, limiting predictive accuracy. This study applies modern machine learning to recent DHS data (2021–2024), offering improved prediction and transparent identification of key determinants of incomplete vaccination. Methods: A secondary analysis was conducted using recent Demographic and Health Survey (DHS) data (2021–2024) from 16 sub‑Saharan African countries. The weighted sample comprised 57,527 children aged 12–35 months. Data cleaning, harmonization, and pooled analysis were performed in STATA 17, with forest plots illustrating pooled and country‑specific incomplete vaccination rates. Eight supervised machine learning algorithms; Naïve Bayes, Decision Trees, K‑Nearest Neighbor, Logistic Regression, Artificial Neural Networks, Extreme Gradient Boosting (XGBoost), CatBoost, and Random Forest were applied for classification and comparison. SHAP analysis enhanced interpretability by ranking maternal, household, and contextual predictors. All analyses were conducted in Python 3.10.2 within Google Colab using scikit‑learn, imblearn, XGBoost, CatBoost, and SHAP packages. Result: The pooled prevalence of incomplete vaccination among children aged 12–35 months in 16 sub‑Saharan African countries was 46.21% (95% CI: 38.58, 53.83%), with the lowest level observed in Ghana (25.21%) and the highest in the Democratic Republic of Congo (73.21%). CatBoost emerged as the best‑performing machine learning algorithm for predicting incomplete childhood vaccination, achieving the highest accuracy (65%) and area under the curve (AUC (70%)) among the models tested. SHAP feature importance analysis revealed that adequate antenatal care visits, maternal media exposure, institutional delivery, being rural residence, health insurance coverage, married marital status, birth order between two and four, and household with high wealth index were the most influential attributes in predicting vaccination outcomes. Conclusion: In conclusion, this study reveals that nearly half of children aged 12–35 months in sub‑Saharan Africa remain incompletely vaccinated, with striking disparities across countries from Ghana’s relatively low 25.21% to the Democratic Republic of Congo’s alarming 73.21%. CatBoost achieved strong predictive accuracy and SHAP feature importance analysis revealed adequate antenatal care visits, maternal media exposure, institutional delivery, being rural residence, health insurance coverage, married marital status, and household with high wealth index were the most influential attributes in predicting vaccination outcomes. These findings underscore the urgent need for targeted interventions that strengthen maternal health services, expand access to facilities, and reduce rural urban inequities. Integrating AI‑driven monitoring into immunization programs offers policymakers actionable tools to accelerate progress toward Immunization Agenda 2030 and safeguard child health.

Article activity feed