Exploration of Large Language Models forGeotagging of Social Media Posts
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate geolocation for social media data is important for applications such as crisis response, public health surveillance, and city planning. However, less than 3% of all postings from mainstream platforms now actually include geographic information, e.g. the latitude/longitude of where the post was made. Classic text-based geotagging aims to establish location through statistical approaches, however, these can be prone to inaccurate classification based on the lack or ambiguity of information in the user profile or indeed in the post text itself. In this paper, we propose a hierarchical geotagging method based on the use of large language models (LLMs) with a spatial hierarchy reflecting the administrative divisions of Australia. The solution utilizes multi-task classification with information transfer through the hierarchies. The proposed method was evaluated using a large dataset of social media messages to compare flat vs hierarchical spatial structures using diverse LLMs. Analysis of the results demonstrates that the hierarchical model outperforms the baseline by at least 10% on suburb accuracy. Furthermore, the results indicate improved calibration, robustness to noisy and truncated data, whilst supporting privacy-awareness through the controlled level of granularity of the geographic information. Overall, these results indicate the efficacy of hierarchical LLM-based geotagging models for reliable and privacy-aware location prediction of social media data.