Predicting Under Five Mortality in Bihar Through Machine Learning and SDG Metrics
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Child mortality is a vital indicator of a nation’s health and development, closely aligned with the Sustainable Development Goals (SDGs). This study investigates the determinants of under-five mortality in Bihar, India, utilizing data from the National Family Health Survey (NFHS-5, 2019–21). A total of 21,040 records of children born to married women were analyzed using 33 predictor variables selected based on their relevance to SDG targets. The research employs a comparative machine learning approach, evaluating the predictive performance of Logistic Regression, Random Forest, K-Nearest Neighbors (KNN), Naıve Bayes, and Support Vector Machine (SVM) models. The results reveal that Random Forest and Naıve Bayes models achieved the highest accuracy (98.80% and 98.67%, respectively), with Naıve Bayes attaining perfect recall (100%) and an F1 score of 99.53%, while Random Forest achieved an F1 scoreof 98.73%. Logistic Regression showed moderate performance with 76.61% accuracy, 74.92% precision, and an F1 score of 76.31%. K-Nearest Neighbors(KNN) achieved 84.12% accuracy and 88.56% precision, but had a lower recallof 75.24%. The Support Vector Machine (SVM) model performed well with 86.38% accuracy and a balanced F1 score of 86.46%. AUC-ROC scores ranged from 85.64% (Logistic Regression) to 99.96% (Random Forest), indicating strong model discrimination across the board. These findings underscore the potential of machine learning in identifying key socio-demographic, economic, and health related factors influencing child survival. The study provides valuable insights for policymakers aiming to reduce child mortality and achieve SDG targets inBihar.