COVID-19 Public Sentiment Insights and Machine Learning for Tweets Classification
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19’s informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning (ML) classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naïve Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.
Article activity feed
-
-
SciScore for 10.1101/2020.06.01.20119347: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources This research used R along with Wordcloud and Wordcloud2 packages, while other packages in R and Python are also available with unique Wordcloud plotting capabilities. 3.2. Pythonsuggested: (IPython, RRID:SCR_001658)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Limitations: The current study has focused on a textual corpus consisting of Tweets filtered by “Coronavirus” …
SciScore for 10.1101/2020.06.01.20119347: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources This research used R along with Wordcloud and Wordcloud2 packages, while other packages in R and Python are also available with unique Wordcloud plotting capabilities. 3.2. Pythonsuggested: (IPython, RRID:SCR_001658)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Limitations: The current study has focused on a textual corpus consisting of Tweets filtered by “Coronavirus” as the keyword. Therefore the analysis and the methods are specifically applied to data about a particular pandemic as a crisis situation, and hence it could be argued that the analytical structure outlined in this paper can only be weakly generalized. Future research could address this and explore “alternative dimensionalities and perform sensitivity analysis” to improve the validity of the insights gained [62]. Furthermore, the analysis used one sentiment lexicon to identify positive and negative sentiments, and one sentiment lexicon to classify the tweets into categories such as fear, sadness, anger and disgust [7,54,55]. Varying information categories have the potential to influence human beliefs and decision making [63], and hence it is important to consider multiple social media platforms with differing information formats (such as short text, blogs, images and comments) to gain a holistic perspective. The present study intended to generate rapid insights for COVID-19 related public sentiment using Twitter data, which was successfully accomplished. This study also intended to explore the viability of machine learning classification methods, and we found sufficient directional support for the use of Naïve Bayes and Logistic classification for short to medium length Tweets, but the accuracy decreased with the increase in the length of Tweets. We have not stated a ...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a protocol registration statement.
-
-