Predicting abnormal birth weight and identifying its determinants using machine learning in the Hararghe Health and Demographic Surveillance System, Ethiopia
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Birth weight is a reliable measure of intrauterine growth and an important predictor of the newborn’s survival, growth, and development. Globally, millions of babies (15.5% for low birth weight and 10% for macrosomia of live births) are born with abnormal birth weight, the majority of whom are in sub-Saharan Africa. Ethiopia is no exception. This study employed a machine learning method using data from the Hararghe health and Demographic Surveillance System, which produced a strong predictive performance in identifying the complex and non-linear relations of factors affecting the birth weight of the newborn. Methods The study developed predictive models for abnormal birth weight using data from the Hararghe Health and Demographic Surveillance System, which is a retrospective cross-sectional in nature, collected from 2015 to 2022. All singleton births were included, and those with missing birth weight data were excluded. Six machine learning models identified to be effective from the previous studies were built and compared to identify the best-performing model for abnormal birth weight prediction. Prior observational studies and expert opinion were used to select the candidate features for all models. Synthetic minority oversampling (SMOTE) was used to manage the imbalance in the dataset. The dataset was divided into training (80%) and testing (20%) subsets to ensure independent model evaluation. Hyper-parametric tuning was performed using grid search combined with 10-fold cross-validation to optimize model performance and reduce over-fitting. The area under the curve (AUROC), accuracy, precision, F1-score, and Kappa were determined. Feature importance analysis was done using Shapley Additive explanation (SHAP) values. Results The Descriptive analysis of 11,553 singleton births showed that 10.78% of the newborns had low birth weight (HBW) and 9.28% had high birth weight (HBW). The eXtreme Gradient Boosting (XGBoost) model performed best by achieving an AUC of 0.835, an accuracy of 0.72, a precision of 0.67, an F1-score of 0.63, a recall of 0.54, and a kappa of 0.52 for abnormal birth weight prediction. The feature importance analysis showed that the top predictors for the low birth weight (LBW) include maternal educational status, age at first delivery, and antenatal care (ANC) visit, while high birth weight (HBW) was strongly predicted by antenatal care (ANC) visit, maternal literacy status, age at first delivery, and maternal education. Conclusion Although using machine learning methods for the prediction of abnormal birth weights has yielded promising results that would have a significant public health impact, more research with comprehensive predictors, which are missing in the Health and Demographic Surveillance System (HDSS), is needed to draw a better conclusion.