Can Social Media Data Be Utilized to Enhance Early Warning: Retrospective Analysis of the U.S. Covid-19 Pandemic

Abstract

The U.S. needs early warning systems to help it contain the spread of infectious diseases. Conventional early warning systems use lab-test results or dynamic records to signal early warning signs. New early warning systems can supplement these data with indicators of public awareness like news articles and search queries. This study aims to explore the potential of utilizing social media data to enhance early warning of the COVID-19 outbreak. To demonstrate the feasibility, this study conducts a retrospective analysis and investigates more than 14 million related Twitter postings in the date range from January 20 to March 10, 2020. With the aid of natural language processing tools and machine learning classifiers, this study classifies each of these tweets into either a signal or a non-signal. In this study, a “signal” tweet implies that the user recognized the COVID-19 outbreak risk in the U.S. This study then proposes a parameter “signal ratio” to signal warning signs of the COVID-19 pandemic over periods. Results reveal that social media data and the signal ratio can detect the hazards ahead of the COVID-19 outbreak. This claim has been validated with a leading time of 16 days through the comparison to other referenced methods based on Google trends or media news.

SciScore for 10.1101/2021.04.11.21255285: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
In this research context, all words in a tweet were represented with numerical information, and each tweet was mapped into a numerical vector for classification. E. TEXT CLASSIFICATION: After each tweet was converted to a vector of features using TF-IDF, we applied several machine learning classifiers provided by Scikit-learn Python library [53], including Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), and Naïve Bayes (NB) to build the pipeline for text classification.	Scikit-learn suggested: (scikit-learn, RRID:SCR_002577) Python suggested: (IPython, RRID:SCR_00…

Software and Algorithms

Sentences

Resources

In this research context, all words in a tweet were represented with numerical information, and each tweet was mapped into a numerical vector for classification. E. TEXT CLASSIFICATION: After each tweet was converted to a vector of features using TF-IDF, we applied several machine learning classifiers provided by Scikit-learn Python library [53], including Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), and Naïve Bayes (NB) to build the pipeline for text classification.

Scikit-learn

suggested: (scikit-learn, RRID:SCR_002577)

Python

suggested: (IPython, RRID:SCR_00…

SciScore for 10.1101/2021.04.11.21255285: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
In this research context, all words in a tweet were represented with numerical information, and each tweet was mapped into a numerical vector for classification. E. TEXT CLASSIFICATION: After each tweet was converted to a vector of features using TF-IDF, we applied several machine learning classifiers provided by Scikit-learn Python library [53], including Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), and Naïve Bayes (NB) to build the pipeline for text classification.	Scikit-learn suggested: (scikit-learn, RRID:SCR_002577) Python suggested: (IPython, RRID:SCR_001658)

Software and Algorithms

Sentences

Resources

Scikit-learn

suggested: (scikit-learn, RRID:SCR_002577)

Python

suggested: (IPython, RRID:SCR_001658)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Can Social Media Data Be Utilized to Enhance Early Warning: Retrospective Analysis of the U.S. Covid-19 Pandemic

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Enhancing Early Warning Outbreak Detection Using Multi Model Stacking Ensemble

A Decade of CDC FluSight Influenza Forecasting

EpidBot: A Natural Language Platform for Generalized Epidemic Intelligence

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Enhancing Early Warning Outbreak Detection Using Multi Model Stacking Ensemble

A Decade of CDC FluSight Influenza Forecasting

EpidBot: A Natural Language Platform for Generalized Epidemic Intelligence