Predicting increases in COVID-19 incidence to identify locations for targeted testing in West Virginia: A machine learning enhanced approach

This article has been Reviewed by the following groups

Read the full article

Abstract

During the COVID-19 pandemic, West Virginia developed an aggressive SARS-CoV-2 testing strategy which included utilizing pop-up mobile testing in locations anticipated to have near-term increases in SARS-CoV-2 infections. This study describes and compares two methods for predicting near-term SARS-CoV-2 incidence in West Virginia counties. The first method, R t Only, is solely based on producing forecasts for each county using the daily instantaneous reproductive numbers, R t . The second method, ML+R t , is a machine learning approach that uses a Long Short-Term Memory network to predict the near-term number of cases for each county using epidemiological statistics such as R t , county population information, and time series trends including information on major holidays, as well as leveraging statewide COVID-19 trends across counties and county population size. Both approaches used daily county-level SARS-CoV-2 incidence data provided by the West Virginia Department Health and Human Resources beginning April 2020. The methods are compared on the accuracy of near-term SARS-CoV-2 increases predictions by county over 17 weeks from January 1, 2021- April 30, 2021. Both methods performed well (correlation between forecasted number of cases and the actual number of cases week over week is 0.872 for the ML+R t method and 0.867 for the R t Only method) but differ in performance at various time points. Over the 17-week assessment period, the ML+R t method outperforms the R t Only method in identifying larger spikes. Results show that both methods perform adequately in both rural and non-rural predictions. Finally, a detailed discussion on practical issues regarding implementing forecasting models for public health action based on R t is provided, and the potential for further development of machine learning methods that are enhanced by R t .

Article activity feed

  1. SciScore for 10.1101/2021.10.06.21264569: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    We utilize a Long-Short Term Memory (LSTM) recurrent neural network (Hochreiter & Schmidthuber, 1997), implemented in Python with an Adam optimizer, as our model of interest for this analysis, permitting consideration of all available county-specific input information for the past 7 days with a prediction of the number of positive cases for the county as an output.
    Python
    suggested: (IPython, RRID:SCR_001658)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Each of the methods for incidence prediction have strengths and weaknesses. The Rt Only method only assumes that all positive cases are known. However, in practice, this assumption is unreasonable and highlights some of the problems with applying the standard Cori Rt model to SARS-CoV-2 data. The Rt Only approach relies on the most recent testing data available, and our daily incidence It represents the number of positive test results from tests performed on day t. Publicly reported case numbers (Dong, Du, & Gardner, 2020) typically represent the number of positive test results reported on the respective day, but the lag time from test procurement varies. Using the day tests were procured eliminates one additional source of variability and brings our proxy for the “serial interval” closer to the relevant distribution (which would be the infectivity profile – see (Challen, Brooks-Pollock, Tsaneva-Atanasova, & Danon, 2020) (Britton & Scalia Tomba, 2019) (Gostic, et al., 2020)). However, this raises a practical issue in that data for day t is typically incomplete on day t and is reported gradually over several days. To address this issue, we estimate SARS-CoV-2 incidence using data from 3 days prior (τreport = incidence at t − 3 days). For example, the weekly total reported on day t = May 12, 2021 represents the week ending on May 9, 2021, and it is this incidence that is used to predict SARS-C0V-2 incidence for the subsequent 7 days. A second issue with the Rt Only method is th...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a protocol registration statement.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.