A seq2seq model to forecast the COVID-19 cases, deaths and reproductive R numbers in US counties

Abstract

The global pandemic of coronavirus disease 2019 (COVID-19) has killed almost two million people worldwide and over 400 thousand in the United States (US). As the pandemic evolves, informed policy-making and strategic resource allocation relies on accurate forecasts. To predict the spread of the virus within US counties, we curated an array of county-level demographic and COVID-19-relevant health risk factors. In combination with the county-level case and death numbers curated by John Hopkins university, we developed a forecasting model using deep learning (DL). We implemented an autoencoder-based Seq2Seq model with gated recurrent units (GRUs) in the deep recurrent layers. We trained the model to predict future incident cases, deaths and the reproductive number, R . For most counties, it makes accurate predictions of new incident cases, deaths and R values, up to 30 days in the future. Our framework can also be used to predict other targets that are useful indices for policymaking, for example hospitalization or the occupancy of intensive care units. Our DL framework is publicly available on GitHub and can be adapted for other indices of the COVID-19 spread. We hope that our forecasts and model can help local governments in the continued fight against COVID-19.

SciScore for 10.1101/2021.04.14.21255507: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Google has released data on daily mobility score changes reflecting the effects of social distancing and local lockdown measures, as well as school and business status.	Google suggested: (Google, RRID:SCR_017097)
Model selection and hyperparameter tuning: Hyperparameter tuning for the Seq2Seq model was carried out using the Keras API (version 2.3.1), the TensorFlow library (version 1.14.0) and HyperOpt (Bergstra, Yamins et al. 2013).	TensorFlow suggested: (tensorflow, RRID:SCR_016345)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: We detected …

SciScore for 10.1101/2021.04.14.21255507: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Google has released data on daily mobility score changes reflecting the effects of social distancing and local lockdown measures, as well as school and business status.	Google suggested: (Google, RRID:SCR_017097)
Model selection and hyperparameter tuning: Hyperparameter tuning for the Seq2Seq model was carried out using the Keras API (version 2.3.1), the TensorFlow library (version 1.14.0) and HyperOpt (Bergstra, Yamins et al. 2013).	TensorFlow suggested: (tensorflow, RRID:SCR_016345)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

One important limitation to consider when interpretating the forecasting results is that the attempt to calculate pandemic projections, such as ours and others, was based upon only the observation of emergent cases. However, case reporting is not uniform across entities. In the US, much of the information about cases is collected and collated at the level of the county or large municipalities, which then report to a state department of health, and which in turn report to national repositories. One entity may put a strong emphasis on testing individuals who present with symptoms, whereas another may have implemented a widespread asymptomatic surveillance policy. How and which cases are identified can be dramatically affected by such policy differences and testing strategies, which were largely influenced by non-objective policy decisions and human interpretation during the course of the pandemic. In short, our machine learning model, as well as most other forecasting models of COVID-19, only learns to predict the reported cases (or deaths and other indices), which were likely biased with non-objective influences that were not uniform across reporting entities. Since January 4, 2021, we have updated our forecast of deaths and R numbers in our Github repository each Monday (https://ylzhang29.github.io/UpstateSU-GRU-Covid/). Visualization of several useful metrics that are derived from our forecasts are provided on the GitHub page to facilitate understanding. These visualizations...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

A seq2seq model to forecast the COVID-19 cases, deaths and reproductive R numbers in US counties

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Enhancing Pandemic Prediction: A Deep Learning Approach Using Transformer Neural Networks and Multi-Source Data Fusion for Infectious Disease Forecasting

Prediction of the Ebola Virus Epidemic using Data-Driven Modeling: A Focus on the historical Western African Ebola Virus Epidemic

Machine learning-based short-term forecasting of COVID-19 hospital admissions using routine hospital patient data

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Related articles

Enhancing Pandemic Prediction: A Deep Learning Approach Using Transformer Neural Networks and Multi-Source Data Fusion for Infectious Disease Forecasting

Prediction of the Ebola Virus Epidemic using Data-Driven Modeling: A Focus on the historical Western African Ebola Virus Epidemic

Machine learning-based short-term forecasting of COVID-19 hospital admissions using routine hospital patient data