Machine Learning and Probabilistic Approaches for Forecasting COVID-19 Transmission and Cases
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Forecasting the effective reproductive number ( R t ) and COVID-19 case counts are critical for guiding public health responses. We developed a machine learning and probabilistic forecasting framework to predict R t and daily case counts at the county level in South Carolina (SC). Our approach utilized initial R t estimates from EpiNow2 R package refined with spatial (covariate-adjusted) smoothing. We then generated R t forecasts using an ensemble of regression, Random Forest, and XGBoost models, and predicted case counts with a probabilistic Poisson model.
This ensemble-based approach consistently outperformed EpiNow2 across different forecast horizons (7-day, 14-day, and 21-day). In the first forecast period (November 11, 2020 – February 02, 2021), the ensemble achieved a median percentage agreement (PA) across counties of 94.4% (IQR: 93.8% – 95.3%) for 7-day ahead R t forecast, compared to 87.0% (IQR: 84.4% – 89.4%) from EpiNow2. In the second period (December 11, 2022 – March 04, 2023), the ensemble attained a 93.0% median PA across counties for Rt forecast (IQR: 91.3% – 94.1%), while EpiNow2 reached 86.8% (IQR: 82.5% – 89.2%). Similar trends were observed for case forecast, with the ensemble model demonstrating improved stability and performance. Combining spatial smoothing with ensemble modeling improves epidemic forecasting by enhancing predictive performance and robustness.