Derivation and validation of a clinical predictive model for longer duration diarrhea among pediatric patients in Kenya using machine learning algorithms

Billy Ogwel
Vincent H. Mzazi
Alex O. Awuor
Caleb Okonji
Raphael O. Anyango
Caren Oreso
John B. Ochieng
Stephen Munga
Dilruba Nasrin
Kirkby D. Tickell
Patricia B. Pavlinac
Karen L. Kotloff
Richard Omore

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Despite the adverse health outcomes associated with longer duration diarrhea (LDD), there are currently no clinical decision tools for timely identification and better management of children with increased risk. This study utilizes machine learning (ML) to derive and validate a predictive model for LDD among children presenting with diarrhea to health facilities.

Methods

LDD was defined as a diarrhea episode lasting ≥ 7 days. We used 7 ML algorithms to build prognostic models for the prediction of LDD among children < 5 years using de-identified data from Vaccine Impact on Diarrhea in Africa study ( N = 1,482) in model development and data from Enterics for Global Health Shigella study ( N = 682) in temporal validation of the champion model. Features included demographic, medical history and clinical examination data collected at enrolment in both studies. We conducted split-sampling and employed K-fold cross-validation with over-sampling technique in the model development. Moreover, critical predictors of LDD and their impact on prediction were obtained using an explainable model agnostic approach. The champion model was determined based on the area under the curve (AUC) metric. Model calibrations were assessed using Brier, Spiegelhalter’s z -test and its accompanying p -value.

Results

There was a significant difference in prevalence of LDD between the development and temporal validation cohorts (478 [32.3%] vs 69 [10.1%]; p < 0.001). The following variables were associated with LDD in decreasing order: pre-enrolment diarrhea days (55.1%), modified Vesikari score(18.2%), age group (10.7%), vomit days (8.8%), respiratory rate (6.5%), vomiting (6.4%), vomit frequency (6.2%), rotavirus vaccination (6.1%), skin pinch (2.4%) and stool frequency (2.4%). While all models showed good prediction capability, the random forest model achieved the best performance (AUC [95% Confidence Interval]: 83.0 [78.6–87.5] and 71.0 [62.5–79.4]) on the development and temporal validation datasets, respectively. While the random forest model showed slight deviations from perfect calibration, these deviations were not statistically significant (Brier score = 0.17, Spiegelhalter p -value = 0.219).

Conclusions

Our study suggests ML derived algorithms could be used to rapidly identify children at increased risk of LDD. Integrating ML derived models into clinical decision-making may allow clinicians to target these children with closer observation and enhanced management.

Version published to 10.1186/s12911-025-02855-6
Jan 15, 2025
Version published to 10.21203/rs.3.rs-4048898/v1 on Research Square
Mar 15, 2024

Development and validation of machine learning models for predicting short- and long-term mortality in gastroparesis patients: a retrospective cohort study using the MIMIC-IV database

This article has 5 authors:
1. Lei Zhu
2. Qi Han
3. Bei Pei
4. Jie Zhang
5. Haolong Qi
This article has no evaluationsLatest version Dec 31, 2025
Development and Validation of a Machine Learning-Based Risk Prediction Model for PICC-Related Bloodstream Infections in Premature Infants Using SHAP Interpretability

This article has 5 authors:
1. Yongqin Guo
2. Yingying Dou
3. Wenxia Song
4. Lihong Wang
5. Li Wang
This article has no evaluationsLatest version Dec 30, 2025
Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

This article has 10 authors:
1. Olanrewaju Eniade
2. Ezekiel Ukwenga
3. Uchenna Akuka
4. Opeyemi Adeniyi
5. Elonna Obak
6. Omolola Adeagbo
7. Peter Babatunde Olaitan
8. Rita Ayanbolade Olowe
9. Tolulope Opakunle
10. Olugbenga Adekunle Olowe
This article has no evaluationsLatest version Jan 25, 2026

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

Related articles

Development and validation of machine learning models for predicting short- and long-term mortality in gastroparesis patients: a retrospective cohort study using the MIMIC-IV database

Development and Validation of a Machine Learning-Based Risk Prediction Model for PICC-Related Bloodstream Infections in Premature Infants Using SHAP Interpretability

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria