Application of Machine Learning to Predict Teenage Pregnancy in Zambia: Evidence from 2024 Zambia Demographic and Health Surveys

Teebeny Zulu
Nasson Nathan Tembo
Patrick Musonda

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Teenage pregnancy remains a significant public health issue in Zambia, contributing to high maternal and child morbidity and mortality, school dropout, and socioeconomic vulnerability. Understanding the most influential predictors of teenage pregnancy is a key initial step in developing targeted interventions to address teenage pregnancy. Therefore, this study is aimed at applying machine learning (ML) and ordinary logistic regression to determine and explain the most influential predictors of teenage pregnancy in Zambia. Methods This cross-sectional study used data from the 2024 Zambia Demographic and Health Survey (ZDHS). Feature selection was performed using mutual information scores and the Boruta algorithm in 5-fold cross-validation (CV). Data was split into 80% training set and 20% held-out testing set. Seven ML models were trained in python on 80% of the training set using 5-fold CV and average metrics were obtained on whom the best performing model was identified which was later tested on the held-out testing set. Model performance was evaluated using accuracy and the area under the receiver operating characteristic curve (AUC-ROC). The Shapley Additive Explanation (SHAP) method was applied to determine feature importance. Weighted logistic regression in STATA was then used to estimate odds ratios for key predictors. Results A total of 3292 adolescents were included in the analysis, of whom 26.3% had experienced teenage pregnancy. The Extreme Gradient Boosting (XGB) model outperformed other models, achieving an accuracy of 0.806 (0.789–0.822) and AUC-ROC 0.839 (0.821–0.858). On the held-out test set, the XGB model achieved the accuracy of 0.746 an AUC-ROC of 0.824. The most influential predictors of teenage pregnancy were participant’s current age, followed by age of the household head, knowledge of ovulatory cycle, wealth status, education, type of place of residence, frequency of watching television, internet use, and province. A one-year increase in participant’s age (OR = 2.20; 95% CI = 2.03–2.37) and knowledge of ovulatory cycle (OR = 2.70; 95% CI = 2.12–3.42), increased the odds of teenage pregnancy in Zambia. Conversely, a one-year increase in the age of the household head (OR = 0.98; 95% CI = 0.98–0.99), belonging to the richer wealth quintile (OR = 0.58; 95% CI = 0.38–0.90), belonging to the richest quintile (OR = 0.18; 95% CI = 0.10–0.31), having attained secondary or higher education (OR = 0.42; 95% CI = 0.25–0.71), watching television at least once a week (OR = 0.67; 95% CI = 0.47–0.95), watching television almost every day (OR = 0.72; 95% CI = 0.52–0.99), internet use in the last 12 months (OR = 0.67; 95% CI = 0.49–0.92), reduced the odds of teenage pregnancy. Conclusion Applying ML models and conventional modelling could help identify and explain the most influential predictors of teenage pregnancy to support evidence-based, targeted interventions.

Version published to 10.21203/rs.3.rs-8754908/v1 on Research Square
Feb 27, 2026

Machine Learning Prediction of Iron Supplement Utilization Among Pregnant Women in Somaliland: Evidence from the Somaliland Demographic and Health Survey 2020

This article has 5 authors:
1. Hamda Jama Yousuf
2. Abdiasis Adem Omar
3. Suhur A. Ahmed
4. Ahmed Abdi Omar
5. Ahmed Abdirahman Farah
This article has no evaluationsLatest version Mar 13, 2026
Predicting Adequate Antenatal Care Utilization Among Pregnant Women in Kenya: A Comparative Machine Learning Study Using the Kenya Demographic and Health Survey

This article has 1 author:
1. Calvince Otieno Ngaji
This article has no evaluationsLatest version Mar 27, 2026
Perinatal Mortality Prediction and Risk Factor Identification Using Machine Learning on Recent Sub-Saharan African DHS Data Affiliations

This article has 8 authors:
1. Tadele Chekol Maru
2. Andualem Enyew
3. Makda Fekadie Tewelgne
4. Eliyas Addisu Taye
5. Agerie Mengistie Zeleke
6. Belayneh Jejaw Abate
7. Deresse Abebe Gebrehana
8. Azanaw Amare Muche
This article has no evaluationsLatest version Mar 30, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Machine Learning Prediction of Iron Supplement Utilization Among Pregnant Women in Somaliland: Evidence from the Somaliland Demographic and Health Survey 2020

Predicting Adequate Antenatal Care Utilization Among Pregnant Women in Kenya: A Comparative Machine Learning Study Using the Kenya Demographic and Health Survey

Perinatal Mortality Prediction and Risk Factor Identification Using Machine Learning on Recent Sub-Saharan African DHS Data Affiliations