Interpretable Machine Learning Model for Pediatric Primary Nephrotic Syndrome Risk Prediction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

[Background] Primary nephrotic syndrome (NS) in children is a common chronic kidney disease in pediatrics, characterized by complex pathogenesis, heterogeneous clinical manifestations, and easy recurrence. Existing clinical diagnosis mainly relies on symptoms and laboratory tests, lacking efficient and accurate early risk prediction tools, which limits the implementation of early intervention and individualized management. With the development of artificial intelligence technology, the construction of machine learning prediction models based on multidimensional clinical data has provided new possibilities for the early identification and precise intervention of NS. [Methods] This study retrospectively collected clinical data of 771 children with primary kidney diseases in the Pediatric Nephrology Ward of the Affiliated Hospital of Zunyi Medical University from 2009 to 2023, including 376 children with NS and 395 children with acute glomerulonephritis. The data were improved by preprocessing methods such as multiple imputation, standardization and coding, and general demographic characteristics, laboratory test indicators and renal pathological characteristics were screened as modeling variables. Four machine learning algorithms, GBDT, XGBoost, random forest (RF) and LightGBM, were used to construct a risk prediction model for the onset of the disease. The model performance was evaluated using five-fold cross validation, and the feature importance was explained by the SHAP method. [Results] All four models showed high predictive ability, among which the random forest model performed best, reaching an accuracy of 99.14%, precision of 99.13%, recall of 99.16%, F1 score of 0.9914 and AUC value of 0.9983 on the validation set. SHAP analysis results showed that indicators such as plasma IgG, total protein, complement C3, and ASO titer contributed significantly to model prediction and were highly consistent with the clinical pathological mechanism of NS, verifying the reliability and clinical interpretability of the model. [Conclusion] This study successfully constructed a risk prediction model for NS in children based on machine learning algorithms, which has high accuracy and good clinical interpretability, and provides strong data support for early screening and individualized treatment of NS. In the future, multi-center and multi-omics validation should be carried out to further improve the generalization ability and clinical application value of the model.

Article activity feed