Utilizing LLMs and ML Algorithms in Disaster-Related Social Media Content
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In this research, we explore the use of Large Language Models (LLMs) and clustering techniques to automate the structuring and labeling of disaster-related social media con-tent. With a gathered dataset comprising millions of tweets related to various disasters, our approach aims to transform unstructured and unlabeled data into a structured and labeled format that can be readily used for training machine learning algorithms and en-hancing disaster response efforts. We leverage LLMs to preprocess and understand the semantic content of the tweets, applying several semantic properties to the data, followed by the application of clustering techniques to identify emerging themes and patterns that may not be captured by predefined categories and are surfaced through topic extraction of the clusters. We proceed with manual labeling and evaluation of 10,000 examples to evaluate the LLMs' ability to understand tweet features. Our methodology is applied to re-al-world data for disaster events, with results directly applicable to actual crisis situations.