An Explainable Machine Learning Model for Predicting Necrotizing Enterocolitis in Neonates Based on Complete Blood Count Parameters
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Necrotising enterocolitis (NEC) is a major cause of morbidity and mortality in neonates, particularly among preterm infants. Identifying effective methods for early prediction is crucial for developing personalised treatment strategies and improving patient outcomes. Given the significant limitations of existing scoring systems and prediction models, there remains a pressing need to establish novel models for assessing NEC risk To this end, we developed and validated an interpretable machine learning (ML) model and deployed a web-based calculator for the early prediction of NEC onset in the neonatal intensive care unit(NICU). Methods We collected complete blood count parameters within the first 24 hours after birth and during the second postnatal week from 116 infants with NEC and 233 non-NEC infants admitted to the NICU of the Second Affiliated Hospital of Air Force Medical University and the Second Affiliated Hospital of Xi’an Medical University between January 2012 and January 2025, and calculated their mean values. Six different ML algorithms were applied to construct classification models for the development of a predictive tool for NEC diagnosis. We quantified model performance using metrics including the area under the receiver operating characteristic curve (AUC). the final model was interpreted using SHapley Additive exPlanations (SHAP), which also quantified feature importance. Results Among the six ML models evaluated, the XGBoost algorithm demonstrated superior performance. It achieved an AUC of 0.917 (95% CI:0.858–0.977), an accuracy of 0.8952 (95% CI:0.8203–0.9465), and a no information rate of 0.6667. Additional performance metrics included a sensitivity of 0.7429, specificity of 0.9714, positive predictive value (PPV) of 0.9286, negative predictive value (NPV) of 0.8831, precision of 0.9286, recall of 0.7429, and an F1-score of 0.8254.The calibration curve indicated a strong agreement between predicted probabilities and observed outcomes. SHAP analysis was employed to identify and rank the contribution of key features to the model's predictions. Furthermore, we developed a user-friendly, web-based calculator based on the final XGBoost model, accessible to clinicians at https://nec.yujincheng.cn/. This final model incorporated seven hematological parameters:mean platelet volume (MPV), red cell distribution width coefficient of variation (RDW-CV), mean corpuscular hemoglobin (MCH), white blood cell count (WBC), neutrophil percentage (NEUT%), platelet distribution width (PDW), and mean corpuscular volume (MCV). Conclusion Leveraging hematological parameters from the first two postnatal weeks, we developed and validated a robust and interpretable XGBoost model for predicting the risk of NEC. This tool facilitates early identification of high-risk neonates by clinicians and provides a foundation for personalizing therapeutic strategies. Furthermore, this study provides substantial digital support for advancing NEC prevention and management towards a more precise, personalized, and proactive paradigm.