Mining Social Media Data for Influenza Vaccine Effectiveness Using a Large Language Model and Chain-of-Thought Prompting
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Influenza vaccine effectiveness (VE) estimation plays a critical role in public health decision-making by quantifying the real-world impact of vaccination campaigns and guiding policy adjustments. Current approaches to VE estimation are constrained by limited population representation, selection bias, and delayed reporting. To address some of these gaps, we propose leveraging large language models (LLMs) with few-shot chain-of-thought (CoT) prompting to mine social media data for real-time influenza VE estimation. We annotated over 4,000 tweets from the 2020–2021 flu season using structured guidelines, achieving high inter-annotator agreement. Our best prompting strategy achieves F 1 scores above 87% for identifying influenza vaccination status and test outcomes, outperforming traditional supervised fine-tuning methods by large margins. These findings indicate that LLM-based prompting approaches effectively identify relevant social media information for influenza VE estimation, offering a valuable real-time surveillance tool that complements traditional epidemiological methods.