Machine Learning-Based Individualized Prediction: Risk Assessment of Retinopathy in Preterm Infants at High Altitude

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Retinopathy of prematurity (ROP) has been one of the main eye troubles leading to childhood blindness. The specific chronic hypoxic environment at high altitude may form a unique risk profile, acting as a potential trigger of the onset and progression of ROP. So far, there is an absence of specific ROP risk predictive model for preterm infants in these areas. Accordingly, this study intended to develop an ROP predictive model at high altitude using machine learning (ML) methods. Methods Through a retrospective collection of the clinical data from 2,138 premature infants who underwent fundus screening at Qinghai Red Cross Hospital between May 2014 and May 2025, this study was conducted with the establishment of a training set (n = 1,470) and a testing set (n = 668) at a 7:3 ratio. Key predictors from 59 candidate variables were screened by employing univariate analysis and LASSO regression. This study continued to construct nine ML models involving logistic regression, decision tree, random forest, XGBoost, LightGBM, support vector machine, Gaussian Naive Bayes, multilayer perceptron, and TabNet. Finally, to evaluate the model performance, another independent testing set was utilized to carry out model training and hyper-parameter optimization were performed using five-fold cross-validation and Bayesian optimization. Results LASSO regression identified 11 key predictors, including perinatal asphyxia, bronchopulmonary dysplasia (BPD), surfactant administration, gestational age, hyperbilirubinemia, respiratory failure, mode of delivery, premature rupture of membranes, intravenous nutritional duration, fasting duration, and total bile acids. The area under the receiver operating characteristic curve (AUC) of all models was greater than 0.82 on the testing set. The AUC of the decision tree model was the highest (0.954, 95% CI: 0.919–0.989), but the random forest model exhibited the optimal comprehensive performance (AUC = 0.933, 95% CI: 0.891–0.974; sensitivity = 0.691; specificity = 0.943; F1 score = 0.631). The integrated model also demonstrated a robust performance (AUC = 0.949). In addition, duration of parenteral nutrition, respiratory failure, and gestational age were identified as the most influential predictors by SHAP analysis. Conclusions This study successfully develops and validates a ML predictive model for ROP in preterm infants at high altitude. With an effective identification of infants at high risk for ROP based on routine clinical indicators, the random forest model demonstrates the optimal overall performance, and hence offers a scientific tool for precision screening and early intervention.

Article activity feed