Prediction of incomplete vaccination among children aged 12-35 months in sub-Saharan Africa: Application of machine learning algorithm using recent (2021-2024) Demographic and Health Survey Data

Abdulkerim Hassen Moloro
Bizunesh Fantahun Kase
Angwach Abrham Asnake
Alemayehu Kasu Gebrehana
Etsay Woldu Anbesu
Kebede Gemeda Sabo
Kusse Urmale Mare
Abdu Hailu Shibeshi
Andualem Addisu Birlie
Hiwot Altaye Asebe

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Despite immunization preventing millions of deaths worldwide, sub‑Saharan Africa continues to face a heavy burden of vaccine‑preventable diseases, with nearly one in five children missing essential doses such as the third diphtheria, tetanus, and pertussis vaccine. Routine programs remain constrained by maternal, household, and contextual barriers, leaving the region far from achieving IA2030 targets. Earlier studies relied mainly on descriptive or regression‑based analyses using older datasets, limiting predictive accuracy. This study applies modern machine learning to recent DHS data (2021–2024), offering improved prediction and transparent identification of key determinants of incomplete vaccination. Methods: A secondary analysis was conducted using recent Demographic and Health Survey (DHS) data (2021–2024) from 16 sub‑Saharan African countries. The weighted sample comprised 57,527 children aged 12–35 months. Data cleaning, harmonization, and pooled analysis were performed in STATA 17, with forest plots illustrating pooled and country‑specific incomplete vaccination rates. Eight supervised machine learning algorithms; Naïve Bayes, Decision Trees, K‑Nearest Neighbor, Logistic Regression, Artificial Neural Networks, Extreme Gradient Boosting (XGBoost), CatBoost, and Random Forest were applied for classification and comparison. SHAP analysis enhanced interpretability by ranking maternal, household, and contextual predictors. All analyses were conducted in Python 3.10.2 within Google Colab using scikit‑learn, imblearn, XGBoost, CatBoost, and SHAP packages. Result: The pooled prevalence of incomplete vaccination among children aged 12–35 months in 16 sub‑Saharan African countries was 46.21% (95% CI: 38.58, 53.83%), with the lowest level observed in Ghana (25.21%) and the highest in the Democratic Republic of Congo (73.21%). CatBoost emerged as the best‑performing machine learning algorithm for predicting incomplete childhood vaccination, achieving the highest accuracy (65%) and area under the curve (AUC (70%)) among the models tested. SHAP feature importance analysis revealed that adequate antenatal care visits, maternal media exposure, institutional delivery, being rural residence, health insurance coverage, married marital status, birth order between two and four, and household with high wealth index were the most influential attributes in predicting vaccination outcomes. Conclusion: In conclusion, this study reveals that nearly half of children aged 12–35 months in sub‑Saharan Africa remain incompletely vaccinated, with striking disparities across countries from Ghana’s relatively low 25.21% to the Democratic Republic of Congo’s alarming 73.21%. CatBoost achieved strong predictive accuracy and SHAP feature importance analysis revealed adequate antenatal care visits, maternal media exposure, institutional delivery, being rural residence, health insurance coverage, married marital status, and household with high wealth index were the most influential attributes in predicting vaccination outcomes. These findings underscore the urgent need for targeted interventions that strengthen maternal health services, expand access to facilities, and reduce rural urban inequities. Integrating AI‑driven monitoring into immunization programs offers policymakers actionable tools to accelerate progress toward Immunization Agenda 2030 and safeguard child health.

Version published to 10.21203/rs.3.rs-8945443/v1 on Research Square
Mar 10, 2026

Longitudinal Analysis of Routine Childhood Vaccination Coverage of Selected Vaccines in Nigeria (1980-2023) using the Global Burden of Disease Study (2023)

This article has 6 authors:
1. Adewunmi Akingbola
2. Abiodun Adegbesan
3. Olajumoke Adewole
4. Emmanuel Nwaeze
5. Joshua Alabi
6. Petra Mariaria
This article has no evaluationsLatest version Apr 17, 2026
Predicting Adequate Antenatal Care Utilization Among Pregnant Women in Kenya: A Comparative Machine Learning Study Using the Kenya Demographic and Health Survey

This article has 1 author:
1. Calvince Otieno Ngaji
This article has no evaluationsLatest version Mar 27, 2026
Perinatal Mortality Prediction and Risk Factor Identification Using Machine Learning on Recent Sub-Saharan African DHS Data Affiliations

This article has 8 authors:
1. Tadele Chekol Maru
2. Andualem Enyew
3. Makda Fekadie Tewelgne
4. Eliyas Addisu Taye
5. Agerie Mengistie Zeleke
6. Belayneh Jejaw Abate
7. Deresse Abebe Gebrehana
8. Azanaw Amare Muche
This article has no evaluationsLatest version Mar 30, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Longitudinal Analysis of Routine Childhood Vaccination Coverage of Selected Vaccines in Nigeria (1980-2023) using the Global Burden of Disease Study (2023)

Predicting Adequate Antenatal Care Utilization Among Pregnant Women in Kenya: A Comparative Machine Learning Study Using the Kenya Demographic and Health Survey

Perinatal Mortality Prediction and Risk Factor Identification Using Machine Learning on Recent Sub-Saharan African DHS Data Affiliations