Nowcasting and forecasting provincial-level SARS-CoV-2 case positivity using google search data in South Africa

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Data from non-traditional data sources, such as social media, search engines, and remote sensing, have previously demonstrated utility for disease surveillance. Few studies, however, have focused on countries in Africa, particularly during the SARS-CoV-2 pandemic. In this study, we use searches of COVID-19 symptoms, questions, and at-home remedies submitted to Google to model COVID-19 in South Africa, and assess how well the Google search data forecast short-term COVID-19 trends. Our findings suggest that information seeking trends on COVID-19 could guide models for anticipating COVID-19 trends and coordinating appropriate response measures.

Article activity feed

  1. SciScore for 10.1101/2020.11.04.20226092: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The search volume for each of the terms relative to the search activity in the region is normalized by Google.
    Google
    suggested: (Google, RRID:SCR_017097)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    There are some limitations to our study. The study period is short, therefore limiting the available data. However, even with the data limitations, we were able to create models with good fit; thus, as the SARS-CoV-2 pandemic continues, we would expect that additional data points would only improve model estimates. Due to the small sample size, confidence intervals (CI) were unreliable. However, CI can be estimated by adding an additional modeling step such as, a generalized linear model (see SI Figure 1 and 2). Second, there might be variability in the at-home remedies that were used in different provinces, and were not captured in our data. We aimed to include an inclusive and comprehensive list of appropriate search terms, and the input by several infectious disease scientists familiar with the South African context provided insight on province-specific cultural and social practices. The inclusion of any further provincial-specific at-home remedies in our search terms would only further improve the model fit. Third, if there are large changes to testing strategies used over time, that may affect the use of these models to forecast case-positivity. Should changes to the testing strategy occur, such as widespread use of rapid antigen diagnostics, or community testing, different outcome metrics may become more reliable for forecasting. Despite these limitations, we demonstrate that there is value in using non-traditional data sources for disease surveillance in sub-Saharan Af...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.