Presenting a new method for identifying and extracting keywords on Twitter related to Covid-19
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The relationship between hashtags and keywords in content generated on social media platforms is considered essential and fundamental. Retrieving information related to a specific topic on Twitter and categorizing them encounters difficulties. Hashtags act as approximate indicators of tweet topics, but due to their ambiguity and flexibility in usage, challenges still exist in searching for content related to a specific topic. Extracting keywords from Twitter is a vital step in displaying the main content of a post or a set of posts. These keywords usually have the best correlation with the textual content. Correctly extracting these keywords can provide the ability to analyze the text's topic and make critical decisions comprehensively. Therefore, research on extracting relationships between hashtags and keywords is of significant importance. It has been transformed into a necessity due to its fundamental role in improving the search and categorization of content on social media platforms. Consequently, this study evaluated the semantic relationship between keyword sets of a Twitter dataset regarding the coronavirus disease and the embedded hashtags in tweets. The dataset tweets amounted to 364,964 in English, each containing up to 280 characters without any images. Also, hashtag validation has been explicitly focused on the coronavirus vaccine; hence, we assume this topic is identified only with a specific hashtag. In this regard, a novel method is introduced, which utilizes semantic graph visualization and ranking techniques for keyword extraction. The proposed model involves the construction of a semantic graph, followed by the application of centrality measures to assign weights to its nodes. Subsequently, the similarity between keywords and hashtags was evaluated using three methods. Finally, two machine learning algorithms were implemented to distinguish between relevant and irrelevant tweets with the hashtag. The results of the two classification algorithms, with 73% and 96% accuracy, respectively, indicate that this approach can effectively validate the relationship between keywords and hashtags.