Predicting Adequate Antenatal Care Utilization Among Pregnant Women in Kenya: A Comparative Machine Learning Study Using the Kenya Demographic and Health Survey
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Adequate antenatal care (ANC) is foundational to reducing maternal and perinatal mortality, yet attendance rates in sub-Saharan Africa remain far below World Health Organization (WHO) recommendations. In Kenya, coverage of four or more ANC visits stands at approximately 54%, masking pronounced disparities across socioeconomic strata and geographic regions. Conventional statistical analyses have identified individual determinants of ANC uptake but fall short of generating individualized risk predictions capable of guiding proactive clinical interventions. Machine learning (ML) algorithms offer a powerful complement to classical approaches by modeling complex, non-linear interactions across multiple predictors. Objectives This study aimed to: (i) develop and compare the predictive performance of three supervised ML classifiers—Artificial Neural Networks (ANN), Support Vector Machines (SVM), and logistic regression (Generalized Linear Model, GLM)—for predicting whether a pregnant woman in Kenya will complete four or more ANC visits; (ii) identify the most influential demographic and socioeconomic determinants of adequate ANC utilization using Random Forest feature importance and binary logistic regression; and (iii) assess the potential of the best-performing model as a clinical decision-support tool. Methods Secondary data were drawn from the 2014 Kenya Demographic and Health Survey (KDHS), a nationally representative, stratified two-stage cluster sample comprising 20,964 women aged 15–49 years who had at least one live birth in the five years preceding the survey. The outcome was dichotomized as adequate ANC (≥ 4 visits, coded 1) versus inadequate ANC (< 4 visits, coded 0). After structured data pre-processing—including systematic missing-data imputation, outlier treatment, and one-hot encoding—17 theoretically grounded features were retained. Feature importance was ranked using the Random Forest Gini Index across 79 encoded variables. All models were trained on a stratified 70% training set and evaluated on a 30% hold-out test set, with hyperparameter optimization performed through 10-fold cross-validation. Model discrimination was quantified using the area under the receiver operating characteristic curve (AUC-ROC). Binary logistic regression was additionally used for inferential analysis of determinants, with findings reported as odds ratios (ORs) and 95% confidence intervals (CIs). Results The ANN achieved the highest overall predictive accuracy (82.9%) and AUC-ROC (83.33%), outperforming SVM (accuracy 82.7%, AUC 83.04%) and GLM (accuracy 82.2%, AUC 83.04%). The timing of first ANC visit emerged as the dominant predictor, with each additional month of delay reducing the odds of adequate utilization by 76.7% (OR = 0.233; 95% CI: 0.220–0.246; p < 0.001). Poverty (OR = 0.795; p < 0.001), lack of education (OR = 0.765 for no schooling vs. primary; p < 0.001), and older age (OR = 0.979 per year; p < 0.001) were significant negative determinants. Conversely, higher education (OR = 1.897; p < 0.001), having a marital partner (OR = 1.538; p < 0.001), facility-based delivery (OR = 1.211; p = 0.035), and greater parity (OR = 1.097; p < 0.001) were positively associated with 4 + ANC attendance. Conclusions Artificial Neural Networks provide the strongest predictive model for ANC utilization in the Kenyan context. Socioeconomic inequality, limited formal education, and absence of partner support remain the primary structural barriers to adequate ANC uptake. Health policies should prioritize conditional financial support for impoverished women, male partner engagement programs, and initiatives promoting early first-trimester ANC initiation. Validation of the ANN model on the 2022 KDHS and deployment as a mobile-based clinical screening tool are priority directions for future research.