Application of Machine Learning (ML) to Predict Under-Five Anemia using the 2018 Zambia Demographic and Health Survey (ZDHS)
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Accurate prediction of the risk of anemia in under five children using ML can help reduce the burden of anemia in Zambia. This study applied ML models to predict the risk of anemia in under five children in Zambia.
Methods
This cross sectional study utilized data from the 2018 ZDHS. Feature selection was performed using the Boruta algorithm. Several ML models were trained on 80% of the data set. The best fit model was selected by comparing the model accuracy and area under the receiver operating characteristic curve (AUC -ROC). The 5-fold cross validation was used with feature importance performed using Shapley additive explanation (SHAP). Logistic regression was later performed in STATA. Python was used for ML modelling.
Results
Out of 7743 under five children, 58.5% had anemia. The top five most influential predictors of under-five anemia were current age of a child, maternal anemia, currently breastfeeding, region and stunting. The main ML model that was used to determine these factors was the XGB that gave an accuracy of 0.6432 and AUC-ROC of 0.6729. Two years of age or below OR=2.17, p<0.0001, currently breastfeeding OR=1.66, p<0.0001, stunting OR=1.27, p<0.0001, being a protestant OR=0.77, p=0.001, being a Muslim OR=0.33, p=0.019, living on the Copperbelt province OR=1.37, p=0.014), Luapula OR=1.91, p<0.0001), North western OR=1.66, p<0.0001, Southern OR=1.35, p=0.019, Western OR=1.47, p=0.005, having mild anemia OR=1.25, p=0.006), and moderate anemia OR=1.67, p<0.0001) significantly increased the odds of having anemia in under five children.
Conclusion
The extreme gradient boosting was the best performing ML model for predicting under-five mortality in Zambia and it showed that current age of a child, maternal anemia, currently breastfeeding, region and stunting are the top five most influential predictors. Therefore, this model could be useful in addressing anemia in Zambia as it can be used as a screening tool for early detection of children with anemia.