Predicting Early Neonatal Mortality using Machine Learning Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Neonatal mortality is a major issue in global health and is included in the Sustainable Development Goals (SDGs). Early neonatal deaths account for 47% of under-five mortality. Developing a dependable model to predict early neonatal mortality and recognise its related risk factors is essential for child survival and enhancing children's health outcomes. We utilised various machine learning models to predict early neonatal mortality using a comprehensive secondary dataset from Oman. Methods Ten different machine learning models were used in three distinct setups: using the original local dataset, applying the data-driven approach represented bySynthetic Minority Over-Sampling Technique (SMOTE) to address the imbalanced distribution, and implementing an algorithm-driven approach via cost-sensitive classification. The goal was to predict early neonatal mortality and identify its associated risk factors. A total of 2,940 de-identified local records on newborn deaths were categorised into early deaths (0–6 days) and late deaths (7–27 days) for model training and testing using a 10-fold cross-validation. Model performance was evaluated based on accuracy, sensitivity, precision, F1-score, and Area Under the Curve (AUC). Given the issue of an imbalanced dataset, AUC was pivotal in evaluating the models. Results The analysis revealed that 71.6% of the deaths occurred during the early neonatal period (0–6 days). Logistic regression (LR) and Linear Discriminant Analysis (LDA) were the top-performing models in two out of the three scenarios, with LR achieving an AUC between 0.7085 and 0.7248, and LDA between 0.7057 and 0.7229. The APGAR score at 5 minutes was identified as the most significant predictor of early neonatal mortality. Conclusion This study is one of the first to train and evaluate multiple machine learning algorithms under three different scenarios to predict early neonatal mortality and identify associated risk factors using real data from Oman. The results indicate that Logistic Regression and Linear Discriminant Analysis performed the best based on their AUC scores. The findings have the potential to inform clinical decision-making and prompt timely interventions to enhance survival rates.