Understanding Adverse Population Sentiment Towards the Spread of COVID-19 in the United States

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

During the ongoing COVID-19 pandemic, the immediate threat of illness and mortality is not the only concern. In the United States, COVID-19 is not only causing physical suffering to patients, but also great levels of adverse sentiment (e.g., fear, panic, anxiety) among the public. Such secondary threats can be anticipated and explained through sentiment analysis of social media, such as Twitter.

Methods

We obtained a dataset of geotagged tweets on the topic of COVID-19 in the contiguous United States during the period of 11/1/2019 - 9/15/2020. We classified each tweet into “adverse” and “non-adverse” using the NRC Emotion Lexicon and tallied up the counts for each category per county per day. We utilized the space-time scan statistic to find clusters and a three-stage regression approach to identify socioeconomic and demographic correlates of adverse sentiment.

Results

We identified substantial spatiotemporal variation in adverse sentiment in our study area/period. After an initial period of low-level adverse sentiment (11/1/2019 - 1/15/2020), we observed a steep increase and subsequent fluctuation at a higher level (1/16/2020 - 9/15/2020). The number of daily tweets was low initially (11/1/2019 - 1/22/2020), followed by spikes and subsequent decreases until the end of the study period. The space-time scan statistic identified 12 clusters of adverse sentiment of varying size, location, and strength. Clusters were generally active during the time period of late March to May/June 2020. Increased adverse sentiment was associated with decreased racial/ethnic heterogeneity, decreased rurality, higher vulnerability in terms of minority status and language, and housing type and transportation.

Conclusions

We utilized a dataset of geotagged tweets to identify the spatiotemporal patterns and the spatial correlates of adverse population sentiment during the first two waves of the COVID-19 pandemic in the United States. The characteristics of areas with high adverse sentiment may be relevant for communication of containment measures. The combination of spatial clustering and regression can be beneficial for understanding of the ramifications of COVID-19, as well as disease outbreaks in general.

Article activity feed

  1. SciScore for 10.1101/2021.07.15.21260543: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Experimental Models: Organisms/Strains
    SentencesResources
    For calculation of the index, we included the racial and ethnic categories: Latino/Hispanic, American Indian and Alaska Native, Asian, Black or African American, Native Hawaiian or Pacific Islander, and Non-Hispanic White [49].
    Non-Hispanic White
    suggested: None

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Despite the merits of our study, we want to point out the following weaknesses and future research directions: 1) While a major strength of the lexicon-based sentiment classification approach lies in its simplicity, this method does not allow us to identify more complicated language features, such as sentiment shifters (e.g., “I don’t like this car”, is negative, even though the word “like” is not; [15]); 2) Twitter users represent a younger demographic group, whose sentiments and opinions may not reflect those of the entire population. In addition, urban areas tend to be overrepresented in tweet samples [63]. We tried to partially address this issue by including the proportion of the population between 18 and 34 (the main demographic who uses Twitter) but discarded the variable during our modelling process due to unacceptably high correlations with other variables on our model. 3) Our regression modelling approach does not consider the temporal dimension (except for the 1st death variable), despite having a spatiotemporally complete dataset. Therefore, our current and future research efforts focus on the application of spatiotemporally explicit modelling using Bayesian statistics to address the spatial and temporal nature of our dataset [64]. Lastly, due to the real-time availability of data, such as tweets and various metrics on COVID-19, it is feasible to apply our methods and update the results of this study daily. For instance, the space-time scan statistic can be employ...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.