Predicting Under Five Mortality in Bihar Through Machine Learning and SDG Metrics
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Child mortality is a vital indicator of a nation’s health and development, closely aligned with the Sustainable Development Goals (SDGs). This study investigates the determinants of under-five mortality in Bihar, India, utilizing data from the National Family Health Survey (NFHS-5, 2019–21). A total of 21,040 records of children born to married women were analyzed using 33 predictor variables selected based on their relevance to SDG targets. The research employs a comparative machine learning approach, evaluating the predictive performance of Logistic Regression, Random Forest, K-Nearest Neighbors (KNN), Naıve Bayes, and Support Vector Machine (SVM) models. The results reveal that Random Forest and Naıve Bayes models achieved the highest accuracy (98.80% and 98.67%, respectively), with Naıve Bayes attaining perfect recall (100%) and an F1 score of 99.53%, while Random Forest achieved an F1 scoreof 98.73%. Logistic Regression showed moderate performance with 76.61% accuracy, 74.92% precision, and an F1 score of 76.31%. K-Nearest Neighbors(KNN) achieved 84.12% accuracy and 88.56% precision, but had a lower recallof 75.24%. The Support Vector Machine (SVM) model performed well with 86.38% accuracy and a balanced F1 score of 86.46%. AUC-ROC scores ranged from 85.64% (Logistic Regression) to 99.96% (Random Forest), indicating strong model discrimination across the board. These findings underscore the potential of machine learning in identifying key socio-demographic, economic, and health related factors influencing child survival. The study provides valuable insights for policymakers aiming to reduce child mortality and achieve SDG targets inBihar.