Machine learning of laboratory parameters to predict mortality risk in pediatric hemophagocytic lymphohistiocytosis: A retrospective single-center study
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background This study aims to screen key peripheral blood laboratory indicators using machine learning algorithms to develop and validate a prediction model for the 30-day mortality risk following the diagnosis of hemophagocytic lymphohistiocytosis (HLH) in children. This seeks to provide a scientific basis for the early clinical identification of high-risk patients. Methods A retrospective cohort study was conducted, encompassing 133 children diagnosed with HLH at the Children's Medical Center of the Affiliated Hospital of Guangdong Medical University between January 1, 2015, and December 30, 2024. Based on the survival outcome within 30 days post-diagnosis, the primary observation endpoint was categorized into a mortality group (n = 29) and a survival group (n = 104). Baseline laboratory indicators from the day of diagnosis or within the preceding 24 hours were collected. The dataset was randomly partitioned into a training set and a validation set at a 7:3 ratio. Initial screening was performed via univariate analysis, followed by principal component analysis (PCA) and variance inflation factor (VIF) assessments to eliminate redundancy and isolate key predictors. Six machine learning models, including LightGBM, XGBoost, and logistic regression, were constructed using the optimized features. Model performance was evaluated using metrics such as the area under the curve (AUC) and F1 scores. The SHapley Additive exPlanations (SHAP) method was introduced for model interpretation, culminating in the construction of a visual nomogram and an online risk calculator. Results Five core predictive variables were identified: procalcitonin (PCT), mean corpuscular hemoglobin (MCH), aspartate aminotransferase (AST), C-reactive protein (CRP), and activated partial thromboplastin time (APTT). Among the six evaluated models, LightGBM demonstrated optimal robustness in the feature decrement experiment (validation set AUC = 0.823). SHAP visual analysis revealed that APTT and MCH contributed most significantly to the predictive outcomes; specifically, high expression levels of APTT, AST, PCT, and CRP, coupled with a low expression level of MCH, were indicative of a high mortality risk. The risk stratification tool derived from this model successfully and significantly distinguished between high-risk and low-risk patients in both the training and validation datasets. Conclusions A prediction model constructed using PCA feature screening and the LightGBM algorithm can effectively utilize routine peripheral blood indicators to quantitatively assess early mortality risk in pediatric HLH. The developed online calculator demonstrates substantial clinical value for auxiliary decision-making.