Early detection of COVID-19 outbreaks using human mobility data

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Contact mixing plays a key role in the spread of COVID-19. Thus, mobility restrictions of varying degrees up to and including nationwide lockdowns have been implemented in over 200 countries. To appropriately target the timing, location, and severity of measures intended to encourage social distancing at a country level, it is essential to predict when and where outbreaks will occur, and how widespread they will be.

Methods

We analyze aggregated, anonymized health data and cell phone mobility data from Israel. We develop predictive models for daily new cases and the test positivity rate over the next 7 days for different geographic regions in Israel. We evaluate model goodness of fit using root mean squared error (RMSE). We use these predictions in a five-tier categorization scheme to predict the severity of COVID-19 in each region over the next week. We measure magnitude accuracy (MA), the extent to which the correct severity tier is predicted.

Results

Models using mobility data outperformed models that did not use mobility data, reducing RMSE by 17.3% when predicting new cases and by 10.2% when predicting the test positivity rate. The best set of predictors for new cases consisted of 1-day lag of past 7-day average new cases, along with a measure of internal movement within a region. The best set of predictors for the test positivity rate consisted of 3-days lag of past 7-day average test positivity rate, along with the same measure of internal movement. Using these predictors, RMSE was 4.812 cases per 100,000 people when predicting new cases and 0.79% when predicting the test positivity rate. MA in predicting new cases was 0.775, and accuracy of prediction to within one tier was 1.0. MA in predicting the test positivity rate was 0.820, and accuracy to within one tier was 0.998.

Conclusions

Using anonymized, macro-level data human mobility data along with health data aids predictions of when and where COVID-19 outbreaks are likely to occur. Our method provides a useful tool for government decision makers, particularly in the post-vaccination era, when focused interventions are needed to contain COVID-19 outbreaks while mitigating the collateral damage from more global restrictions.

Article activity feed

  1. SciScore for 10.1101/2021.05.20.21257557: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Our analysis has several limitations. We assumed that both the mobility and the health data were relatively accurate estimates of the true amounts of travel and prevalence of COVID-19 in a region, respectively. If the health data for a given district is skewed due to selection bias in who receives tests, forecasts for other districts would be affected through the mobility data. Districts were included in our dataset only when one statistical region within the district reported at least 15 accumulated cases, tests, and recoveries. Each time a statistical region started to be documented in the health dataset, our dataset experienced an increase in the number of cases that may not reflect an actual outbreak. Future work could develop methods to impute these missing values with constraints based on the total number of reported cases on a day. Smoother data would aid predictions of actual outbreaks as models would be less likely to overfit to random noise in the dataset. Our analysis predicts new cases based on information about known cases and does not take into account cases that were never detected (e.g., asymptomatic cases). Future work could develop methods for adjusting predictions to accurately account for undetected cases. Our models predict new cases more accurately than the test positivity rate. This is because the daily changing sample sizes make it hard to consider the test positivity rate as a consistent stochastic process or to draw conclusions based on the test posi...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a protocol registration statement.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.