A machine learning model for predicting severe mycoplasma pneumoniae pneumonia in School-Aged children

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective To develop an interpretable machine learning (ML) model for predicting severe Mycoplasma pneumoniae pneumonia (SMPP) in order to provide reliable factors for predicting the clinical type of the disease. Methods We collected clinical data from 483 school-aged children with M. pneumoniae pneumonia (MPP) who were hospitalized at the Children's Hospital of Soochow University between September 2021 and June 2024. Difference analysis and univariate logistic regression were employed to identify predictors for training features in ML. Eight ML algorithms were used to build models based on the selected features, and their effectiveness was validated. The area under the curve (AUC), accuracy, five-fold cross-validation, and decision curve analysis (DCA) were utilized to evaluate model performance. Finally, the best-performing ML model was selected, and the Shapley Additive Explanations (SHAP) method was applied to rank the importance of clinical features and interpret the final model. Results After feature selection, 30 variables remained. We constructed eight ML models and assessed their effectiveness, finding that the CatBoost model exhibited the best predictive performance, with an AUC of 0.934 and an accuracy of 0.9175. DCA was used to compare the clinical benefits of the models, revealing that the CatBoost model provided greater net benefits than the other ML models within the threshold probability range of 34–75%. Additionally, we applied the SHAP method to interpret the CatBoost model, and the SHAP diagram was used to visually show the influence of predictor variables on the outcome. The results identified the top six risk factors as the number of days with fever, D-dimer, platelet count (PLT), C-reactive protein (CRP), lactate dehydrogenase (LDH), and the neutrophil-to-lymphocyte ratio (NLR). Conclusions The interpretable CatBoost model can help physicians accurately identify school-aged children with SMPP. This early identification facilitates better treatment options and timely prevention of complications. Furthermore, the SHAP algorithm enhances the model's transparency and increases its trustworthiness in practical applications.

Article activity feed