Interval forecasts of weekly incident and cumulative COVID-19 mortality in the United States: A comparison of combining methods
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
A combined forecast from multiple models is typically more accurate than an individual forecast, but there are few examples of studies of combining in infectious disease forecasting. We investigated the accuracy of different ways of combining interval forecasts of weekly incident and cumulative coronavirus disease-2019 (COVID-19) mortality.
Methods
We considered weekly interval forecasts, for 1- to 4-week prediction horizons, with out-of-sample periods of approximately 18 months ending on 8 January 2022, for multiple locations in the United States, using data from the COVID-19 Forecast Hub. Our comparison involved simple and more complex combining methods, including methods that involve trimming outliers or performance-based weights. Prediction accuracy was evaluated using interval scores, weighted interval scores, skill scores, ranks, and reliability diagrams.
Results
The weighted inverse score and median combining methods performed best for forecasts of incident deaths. Overall, the leading inverse score method was 12% better than the mean benchmark method in forecasting the 95% interval and, considering all interval forecasts, the median was 7% better than the mean. Overall, the median was the most accurate method for forecasts of cumulative deaths. Compared to the mean, the median’s accuracy was 65% better in forecasting the 95% interval, and 43% better considering all interval forecasts. For all combining methods except the median, combining forecasts from only compartmental models produced better forecasts than combining forecasts from all models.
Conclusions
Combining forecasts can improve the contribution of probabilistic forecasting to health policy decision making during epidemics. The relative performance of combining methods depends on the extent of outliers and the type of models in the combination. The median combination has the advantage of being robust to outlying forecasts. Our results support the Hub’s use of the median and we recommend further investigation into the use of weighted methods.
Article activity feed
-
-
-
-
SciScore for 10.1101/2021.07.11.21260318: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics not detected. Sex as a biological variable not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Study limitations include the retrospective design, being based on the most recent version of the ‘truth data’ for all the weeks at the time of analysis, instead of the numbers of COVID-19 deaths that were reported at the time the forecasts were submitted. …
SciScore for 10.1101/2021.07.11.21260318: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics not detected. Sex as a biological variable not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Study limitations include the retrospective design, being based on the most recent version of the ‘truth data’ for all the weeks at the time of analysis, instead of the numbers of COVID-19 deaths that were reported at the time the forecasts were submitted. We have shown that there were several states for which there were notable effects of updates in death counts, due to reporting delays, and this adversely affected the accuracy of the forecasts of all the combining methods and models. However, this issue only had a minor effect on the relative performances of the methods, and did not alter our overall conclusions. Our reported findings are limited to U.S. data and the forecasts from the COVID-19 Forecast Hub, and so it is possible that different results may arise when applying the combining methods to forecasts from a different set of models, or using other data, such as forecasts for other locations, or predictions of COVID-19 cases or hospitalisations. These are interesting potential avenues for future research. The forecasts in our dataset were produced weekly for 1 to 4 week ahead horizons, and we acknowledge that conclusions could be different for different time-scales. Our ability to detect statistical differences was limited by the small sample sizes, with only 17 locations in each category, missing data and a relatively short out-of-sample period. This research has important policy implications as forecasts from models have been placed at the forefront of public heal...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-