AI for Cholera Outbreak Prediction, Real-Time Tracking, and Low-Resource Diagnostics using Federated and Privacy-Preserving Machine Learning

Idowu Olugbenga Adewumi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study presents a multi-model computational framework for predicting cholera outbreaks using spatio-temporal, climatic, and socio-environmental predictors across regions with recurrent epidemics. The dataset included 17,842 records of regions and days from January 2024 to May 2025, divided into 70% for training, 15% for validation, and 15% for testing in cross-sectional models, along with rolling-origin splits for time-series models. Two forecasting tasks were examined: (i) prediction of reported cases and (ii) categorization of outbreak severity into low (0–9 cases), medium (10–29 cases), and high (≥ 30 cases). Baseline statistical evaluations utilized Poisson and Negative Binomial regression methods. Overdispersion tests (variance/mean ratio = 2.7) highlighted the advantages of Negative Binomial models, which identified rainfall (IRR = 1.18, 95% CI: 1.10–1.26) and water salinity (IRR = 1.11, 95% CI: 1.06–1.16) as major contributors to outbreak risk, whereas sanitation coverage lowered incidence rates by 23% (IRR = 0.77, 95% CI: 0.71–0.84). Experiments with machine learning demonstrated significant enhancements in performance. Random Forest regression lowered RMSE from 41.2 (baseline) to 28.9, whereas classification reached a macro-F1 of 0.81. XGBoost enhanced classification results with macro-F1 = 0.87 and ROC-AUC = 0.91, surpassing Random Forest (macro-F1 = 0.79, ROC-AUC = 0.86). SHAP analysis identified rainfall, sanitation, and mobility index as the three primary factors, responsible for 62% of the variance in predicting outbreaks. Deep learning utilizing Long Short-Term Memory (LSTM) networks delivered the most precise time-based predictions. For a 7-day forecast, LSTM produced RMSE = 25.3 ± 6.2, MAE = 18.4 ± 4.7, and MAPE = 12.8 ± 3.1, while ARIMA showed RMSE = 27.9 ± 7.4 and MAPE = 17.5 ± 4.5, and naive benchmarks had MAPE ≥ 20%. Over a 14-day period, LSTM maintained its advantage with RMSE = 39.5 ± 10.2 and MAPE = 20.5 ± 5.6, surpassing ARIMA (RMSE = 41.2 ± 11.0; MAPE = 24.7 ± 6.3). Federated learning trials involving 5 regional clients showed performance comparable to centralized learning, achieving an accuracy of 0.84 (without differential privacy) and 0.78 (with DP, σ = 1.0). Privacy-utility trade-offs resulted in ε = 3.1–7.8 for δ = 1e-5, confirming practicality in low-bandwidth settings (average communication overhead = 11.4 MB per round). The results indicate that LSTM-based forecasting increases epidemic prediction accuracy by as much as 25% compared to ARIMA and 35% compared to naive methods, while XGBoost boosts outbreak severity classification by 8% relative to Random Forests. Federated models guarantee privacy-focused scalability with merely 5–9% loss in utility. These findings highlight the promise of combining ensemble learning, deep temporal models, and federated AI to create resilient, data-sovereign public health surveillance systems for areas susceptible to cholera.

Version published to 10.21203/rs.3.rs-7441133/v1 on Research Square
Aug 27, 2025

Rainfall, Mosquito Indices, and Dengue Outbreaks in Southern Taiwan: Reassessing Predictive Modeling with Machine Learning Approaches

This article has 1 author:
1. Hsiang Hong
This article has no evaluationsLatest version Sep 12, 2025
A mathematical modeling and method for predicting COVID-19-like infectious disease outbreaks

This article has 3 authors:
1. Jiaen Zheng
2. Jiaxing Zheng
3. Ruobin Zheng
This article has no evaluationsLatest version Sep 26, 2025
Revealing Dengue Dynamics: A Novel Bin-Wise Gaussian Process Model for Probabilistic Forecasting

This article has 4 authors:
1. Ewerton Rocha Vieira
2. Konstantin Mischaikow
3. Claudia M.E. Romero Vivas
4. Ubydul Haque
This article has no evaluationsLatest version Oct 16, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Rainfall, Mosquito Indices, and Dengue Outbreaks in Southern Taiwan: Reassessing Predictive Modeling with Machine Learning Approaches

A mathematical modeling and method for predicting COVID-19-like infectious disease outbreaks

Revealing Dengue Dynamics: A Novel Bin-Wise Gaussian Process Model for Probabilistic Forecasting