An “Infodemic”: Leveraging High-Volume Twitter Data to Understand Early Public Sentiment for the Coronavirus Disease 2019 Outbreak

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

Twitter has been used to track trends and disseminate health information during viral epidemics. On January 21, 2020, the Centers for Disease Control and Prevention activated its Emergency Operations Center and the World Health Organization released its first situation report about coronavirus disease 2019 (COVID-19), sparking significant media attention. How Twitter content and sentiment evolved in the early stages of the COVID-19 pandemic has not been described.

Methods

We extracted tweets matching hashtags related to COVID-19 from January 14 to 28, 2020 using Twitter’s application programming interface. We measured themes and frequency of keywords related to infection prevention practices. We performed a sentiment analysis to identify the sentiment polarity and predominant emotions in tweets and conducted topic modeling to identify and explore discussion topics over time. We compared sentiment, emotion, and topics among the most popular tweets, defined by the number of retweets.

Results

We evaluated 126 049 tweets from 53 196 unique users. The hourly number of COVID-19-related tweets starkly increased from January 21, 2020 onward. Approximately half (49.5%) of all tweets expressed fear and approximately 30% expressed surprise. In the full cohort, the economic and political impact of COVID-19 was the most commonly discussed topic. When focusing on the most retweeted tweets, the incidence of fear decreased and topics focused on quarantine efforts, the outbreak and its transmission, as well as prevention.

Conclusions

Twitter is a rich medium that can be leveraged to understand public sentiment in real-time and potentially target individualized public health messages based on user interest and emotion.

Article activity feed

  1. SciScore for 10.1101/2020.04.03.20052936: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    (Python Software Foundation) and RStudio version 1.2.1335 (R Foundation for Statistical Computing).
    RStudio
    suggested: (RStudio, RRID:SCR_000432)
    m Python package [16]) automatically generates topics from observations (in our case, from tweets) and groups similar observations to one or more of these topics using the distribution of words.
    Python
    suggested: (IPython, RRID:SCR_001658)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Limitations: This study had several limitations. First, we used a non-comprehensive list of hashtags that was limited by knowledge of trending hashtags and the imagination of the authors. We may have missed alternative terminology or misspellings and may have introduced some selection bias in the tweets we analyzed. For example, #wuhanoutbreak was not included, but arose as a weighted term in our topic modeling. Conversely, #coronavirus may have identified tweets related to other infections such as Severe Acute Respiratory Syndrome. Second, despite the large number of tweets analyzed (>126K), we collected and analyzed only a subset (1%) of all tweets, which may also introduce some selection bias. However, using the Twitter API, we were assured that the sample constituted a representative subset of the entire stream. Third, we targeted tweets in the English language; thus, our conclusions may not be generalizable to other countries where English is not the predominant language. Lastly, we recognize that ascribing topic themes based on a subset of weighted terms has opportunity for labeling bias. To mitigate that, two authors designed the topic model and a separate set of authors labeled the topic themes. Conclusions: We show that the frequency of tweets was associated with the number of infected individuals for the early stages of the COVID-19 pandemic. Tweets predominantly showed negative sentiment and were linked to emotions of fear primarily, as well as surprise and anger. ...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.