External Validation of Predictive Models for Diagnosis, Management and Severity of Pediatric Appendicitis

Ričards Marcinkevičs
Kacper Sokol
Akhil Paulraj
Melinda A. Hilbert
Vivien Rimili
Sven Wellmann
Christian Knorr
Bertram Reingruber
Julia E. Vogt
Patricia Reis Wolfertstetter

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Appendicitis is a common condition among children and adolescents. Machine learning models can offer much-needed tools for improved diagnosis, severity assessment and management guidance for pediatric appendicitis. However, to be adopted in practice, such systems must be reliable, safe and robust across various medical contexts, e.g., hospitals with distinct clinical practices and patient populations.

Methods

We performed external validation of models predicting the diagnosis, management and severity of pediatric appendicitis. Trained on a cohort of 430 patients admitted to the Children’s Hospital St. Hedwig (Regensburg, Germany), the models were validated on an independent cohort of 301 patients from the Florence-Nightingale-Hospital (Düsseldorf, Germany). The data included demographic, clinical, scoring, laboratory and ultrasound parameters. In addition, we explored the benefits of model retraining and inspected variable importance.

Results

The distributions of most parameters differed between the datasets. Consequently, we saw a decrease in predictive performance for diagnosis, management and severity across most metrics. After retraining with a portion of external data, we observed gains in performance, which, nonetheless, remained lower than in the original study. Notably, the most important variables were consistent across the datasets.

Conclusions

While the performance of transferred models was satisfactory, it remained lower than on the original data. This study demonstrates challenges in transferring models between hospitals, especially when clinical practice and demographics differ or in the presence of externalities such as pandemics. We also highlight the limitations of retraining as a potential remedy since it could not restore predictive performance to the initial level.

Version published to 10.1101/2024.10.28.24316300 on medRxiv
Oct 29, 2024

Comparison of the Appendicitis Inflammatory Response (AIR) and Pediatric Appendicitis Score (PAS) in Predicting Perforated Appendicitis in Children: A Prospective Study

This article has 1 author:
1. Husam IBRAHIMOGLU
This article has no evaluationsLatest version Sep 1, 2025
Development and Validation of a Screening Model for Early Diagnosis of Biliary Atresia in Neonates with Cholestasis

This article has 11 authors:
1. Zhaozhou Liu
2. Yuyan Jin
3. Yong Zhao
4. Yanan Zhang
5. Shuangshuang Li
6. Junmin Liao
7. Kaiyun Hua
8. Yichao Gu
9. Dayan Sun
10. Dingding Wang
11. Jinshi Huang
This article has no evaluationsLatest version Sep 9, 2025
Predictive model for differentiating malignant and benign small pulmonary nodules

This article has 5 authors:
1. Ming-Ze Li
2. Su-Qin Li
3. Yi-Bing Shi
4. Yao-Yao Wang
5. Tao Meng
This article has no evaluationsLatest version Oct 8, 2025

Discuss this preprint

Listed in

Abstract

Methods

Results

Conclusions

Article activity feed

Related articles

Comparison of the Appendicitis Inflammatory Response (AIR) and Pediatric Appendicitis Score (PAS) in Predicting Perforated Appendicitis in Children: A Prospective Study

Development and Validation of a Screening Model for Early Diagnosis of Biliary Atresia in Neonates with Cholestasis

Predictive model for differentiating malignant and benign small pulmonary nodules