Presenting a new method for identifying and extracting keywords on Twitter related to Covid-19

Seyedeh Fatemeh Langari
Hassan Saneifar
Meraj Hejazi
Hassan Ahmadi Choukalaei

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The relationship between hashtags and keywords in content generated on social media platforms is considered essential and fundamental. Retrieving information related to a specific topic on Twitter and categorizing them encounters difficulties. Hashtags act as approximate indicators of tweet topics, but due to their ambiguity and flexibility in usage, challenges still exist in searching for content related to a specific topic. Extracting keywords from Twitter is a vital step in displaying the main content of a post or a set of posts. These keywords usually have the best correlation with the textual content. Correctly extracting these keywords can provide the ability to analyze the text's topic and make critical decisions comprehensively. Therefore, research on extracting relationships between hashtags and keywords is of significant importance. It has been transformed into a necessity due to its fundamental role in improving the search and categorization of content on social media platforms. Consequently, this study evaluated the semantic relationship between keyword sets of a Twitter dataset regarding the coronavirus disease and the embedded hashtags in tweets. The dataset tweets amounted to 364,964 in English, each containing up to 280 characters without any images. Also, hashtag validation has been explicitly focused on the coronavirus vaccine; hence, we assume this topic is identified only with a specific hashtag. In this regard, a novel method is introduced, which utilizes semantic graph visualization and ranking techniques for keyword extraction. The proposed model involves the construction of a semantic graph, followed by the application of centrality measures to assign weights to its nodes. Subsequently, the similarity between keywords and hashtags was evaluated using three methods. Finally, two machine learning algorithms were implemented to distinguish between relevant and irrelevant tweets with the hashtag. The results of the two classification algorithms, with 73% and 96% accuracy, respectively, indicate that this approach can effectively validate the relationship between keywords and hashtags.

Version published to 10.21203/rs.3.rs-7832005/v1 on Research Square
Nov 6, 2025

Content-based detection of misinformation expands its scope across politicians and platforms

This article has 4 authors:
1. Sami Nenno
2. Cornelius Puschmann
3. Kamil Fuławka
4. Philipp Lorenz-Spreen
This article has no evaluationsLatest version Nov 1, 2025
A Systematic Review of Sentiment Analysis Systems Applied to Textual Data

This article has 2 authors:
1. Phuong Dao Quoc
2. Vuong M. Ngo
This article has no evaluationsLatest version Oct 3, 2025
Email Summarizer: A Novel Hybrid Approach to Email Summarization

This article has 4 authors:
1. Rahul Kumar Yadav
2. Anupama Namburu
3. Siddhant Sharma
4. Qutaiba Humadi Mohammed
This article has no evaluationsLatest version Oct 30, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Content-based detection of misinformation expands its scope across politicians and platforms

A Systematic Review of Sentiment Analysis Systems Applied to Textual Data

Email Summarizer: A Novel Hybrid Approach to Email Summarization