Engagement With COVID-19 Public Health Measures in the United States: A Cross-sectional Social Media Analysis from June to November 2020

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

COVID-19 has continued to spread in the United States and globally. Closely monitoring public engagement and perceptions of COVID-19 and preventive measures using social media data could provide important information for understanding the progress of current interventions and planning future programs.

Objective

The aim of this study is to measure the public’s behaviors and perceptions regarding COVID-19 and its effects on daily life during 5 months of the pandemic.

Methods

Natural language processing (NLP) algorithms were used to identify COVID-19–related and unrelated topics in over 300 million online data sources from June 15 to November 15, 2020. Posts in the sample were geotagged by NetBase, a third-party data provider, and sensitivity and positive predictive value were both calculated to validate the classification of posts. Each post may have included discussion of multiple topics. The prevalence of discussion regarding these topics was measured over this time period and compared to daily case rates in the United States.

Results

The final sample size included 9,065,733 posts, 70% of which were sourced from the United States. In October and November, discussion including mentions of COVID-19 and related health behaviors did not increase as it had from June to September, despite an increase in COVID-19 daily cases in the United States beginning in October. Additionally, discussion was more focused on daily life topics (n=6,210,255, 69%), compared with COVID-19 in general (n=3,390,139, 37%) and COVID-19 public health measures (n=1,836,200, 20%).

Conclusions

There was a decline in COVID-19–related social media discussion sourced mainly from the United States, even as COVID-19 cases in the United States increased to the highest rate since the beginning of the pandemic. Targeted public health messaging may be needed to ensure engagement in public health prevention measures as global vaccination efforts continue.

Article activity feed

  1. SciScore for 10.1101/2021.02.05.21250127: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board StatementIRB: This study was exempted from Institutional Review Board review by Yale University as it did not engage in research involving human subjects.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.
    Cell Line AuthenticationAuthentication: 23] Signals Analytics, an advanced analytics consultant that conducted the analysis, accessed these data sources through a third-party data vendor, NetBase.[24,25] These social media posts were geotagged by NetBase both directly, by using geolocation data from posts, and indirectly, by using author profiles and unique domain codes (such as .uk).

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Our study has several limitations. First, although our third-party data provider reported that about 70% of posts were from the US, we do not know the location for most posts according to our direct geotagging methods, which were only able to tag about 80% of posts (eTable 2). As a result, we cannot make international comparisons, but our dataset is more representative of the US than of any other country. Second, the number of posts included in our dataset was much lower than previous studies, likely due to the types of data sources used, which excluded sites such as Twitter in order to exclude noise that might have obscured signals in social media data, and our methodology, which included removing posts not relevant to our more refined taxonomy. We used a stringent exclusion criterion with a list of prespecified keywords that may also have led to a smaller sample size, but our approach aimed to create a sample with high specificity. Finally, there is no demographic information available from the data posts directly due to privacy considerations and data use agreements. Thus, we cannot determine whether our data sample contains biases due to the demographics of the people who post. For instance, Reddit, which was the most common forum source for our data sample, has been found to be used by a younger, male audience.[45,46]

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.