Machine Learning and Probabilistic Approaches for Forecasting Infectious Disease Transmission and Cases

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objectives

Forecasting the effective reproductive number ( R t ) and infection case counts is critical for guiding public health responses. We developed a machine learning and probabilistic forecasting framework to predict R t and daily COVID-19 cases, respectively, across South Carolina counties, with the flexibility to generalize to other infectious diseases.

Methods

We first estimated R t using the EpiNow2 R package, which incorporates Bayesian time-series modeling and accounts for reporting delay and incubation period. These initial estimates were refined using spatial covariate-adjusted smoothing through the Integrated Nested Laplace Approximation (INLA). We then generated R t forecasts using an ensemble of linear regression, random forest, and XGBoost models. Daily case forecasts were obtained by linking R t trajectories with historical case data via a Poisson model.

Results

This ensemble-based approach outperformed EpiNow2 across different forecast horizons (7-day, 14-day, and 21-day). In the first forecast period (November 11, 2020 – February 02, 2021), the ensemble achieved a median PA of 96.5% (IQR: 95.4% – 97.1%) for 7-day horizon R t forecast, compared to 87.0% (IQR: 84.4% – 89.4%) from EpiNow2. In the second period (December 11, 2022 – March 04, 2023), the ensemble attained a 93.0% median PA for R t forecast (IQR: 90.8% – 95.4%), while EpiNow2 reached 86.8% (IQR: 82.5% – 89.2%). Similar trends were observed for case forecasts, with the ensemble model demonstrating improved performance.

Conclusion

This study presents a flexible forecasting framework that integrates Bayesian estimation, spatial smoothing, and ensemble machine learning to improve the accuracy of COVID-19 transmission and case forecasts. The approach enhances epidemic forecasting performance and offers scalable tools to support data-driven public health preparedness and response.

Article activity feed