Machine Learning Prediction of Child Stunting and Wasting in Ethiopia Using DHS Data: XGBoost and Random Forest Models with SHAP Interpretability

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Child malnutrition keeps being one of the biggest issues in global public health. Machine learning has real potential for forecasting health trends, but putting it to work on tricky survey data that represents whole nations calls for strong approaches. Those approaches help make sure the results apply broadly and stay clear enough for shaping policies. Stunting (38%) and wasting (10%) place a measure of severe developmental and economic burden on the young child. Conventional statistical models often lack sufficient capacity to describe the complex interrelations between the various social, demographic, and environmental determinants. Our objective was to construct accurate machine learning (ML) models for stunting and wasting in children <5 y of age from a nationally representative sample and to determine principal modifiable risk factors associated with both. Methods: We used 2016 Ethiopia Demographic and Health Survey (EDHS) data, which included 10,641 children under-five years of age children as a sample and trained RF model and XGBoost models. Stunting was the height-for-age Z-score <-2 SD, wasting the weight-for-height Z-score <-2 SD. Preprocessing of the data involved using multiple imputations for missing variables and correcting (weighted) survey adjustments. Model performance was assessed by 10-fold cross-validation. SHAP (SHapley Additive exPlanations) values and Partial Dependence Plots (PDPs) were employed for feature importance and model interpretability. Results: XGBoost always outperformed Random Forest in all the performance measures, considering stunting and wasting. For stunting, XGBoost provided an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.82 and Random Forests had an AUC of 0.79. For wasting, XGBoost achieved an AUC of 0.76 while the Random Forest’s approximated AUC was 0.71. Selected key modifiable predictors by SHAP values were- Household wealth index (SHAP value: 0.19), Maternal education (SHAP value: 0.17) and sanitation access (SHAP value: ~0.14). Partial dependence plots showed something pretty clear. There is a 3.2-fold higher risk of stunting in the poorest households, which indicates 52 percent. That is compared to just 16 percent in the richest ones. Conclusions: Machine learning models can predict child malnutrition risk accurately, especially XGBoost. Highlight the main socioeconomic and environmental drivers in Ethiopia; basically, integrating these models into the national health system would help. It could lead to more precise interventions for targeted and proactive interventions. This study provides a robust, interpretable framework for predicting child malnutrition risk, demonstrating the utility of integrating machine learning with complex survey data to guide targeted public health interventions. The methodological approach underscores the importance of socioeconomic and environmental drivers like household wealth, maternal education, and sanitation.

Article activity feed