Bayesian hybrid statistical and machine learning models for dengue forecasting in Bangladesh: Temporal and spatial analysis for an early warning system

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Dengue remains a major public health concern in Bangladesh, yet reliable forecasting models that integrate climatic and demographic drivers are limited. Developing an early warning system (EWS) capable of anticipating outbreaks is critical for effective prevention and control. We analysed hospital-based dengue surveillance data covering admissions from January 2000 to August 2025 alongside climatic (temperature, rainfall, humidity) and demographic (population density, proportion of urban population) covariates. A suite of Bayesian statistical mixture and machine learning hybrid models, including SARIMA–Poisson, SARIMA–negative binomial (NB), SARIMA–SVM, SARIMA–LSTM, and SARIMA–XGBoost, were evaluated. Model performance was assessed using Leave-One-Out Information Criterion (looic), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Continuous Ranked Probability Score (CRPS), and coverage probability (CVG). Sensitivity and specificity were also computed to assess early warning performance. Spatial dependence was examined using Global Moran’s I and Local Moran’s I (LISA) cluster maps by using district level monthly dengue hospitalize cases data for 2019, and from 2022 to 2024. Rainfall and portion of urban population emerged as significant drivers of dengue incidence, while temperature, humidity, and population density were less influential. Global Moran’s I indicated no significant spatial autocorrelation at the district level; however, LISA maps identified localised hotspots. Among the candidate models, the Bayesian SARIMA–XGBoost hybrid achieved the best predictive performance, with the lowest Continuous Ranked Probability Score (CRPS) and the highest coverage probability (CVG), providing the most balanced sensitivity–specificity trade-off. Forecasts for January to August 2025 accurately reproduced seasonal dynamics, predicting a sharp rise during the monsoon, with peak incidence in July. Although magnitudes were overestimated, outbreak timing was well captured. The Bayesian SARIMA–XGBoost hybrid model offers a robust framework for probabilistic dengue forecasting in Bangladesh. By linking upstream surveillance data to forecast production, this study demonstrates the potential for a fully implemented early warning system (EWS) to strengthen outbreak preparedness. Future work should incorporate finer spatial resolution, real-time climate forecasts, and entomological indicators to enhance operational deployment.

Article activity feed