Integrating Retrieval-Augmented Generation and Thematic NLP for Vaccine Confidence Modeling in Alaska
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Vaccine misinformation poses a significant public health threat, particularly in communities with varying levels of vaccine confidence. This study investigated vaccine hesitancy across Alaska’s diverse communities by triangulating public sentiment from social media with individual beliefs gathered through qualitative interviews. The aim was to explore how online discourse influences vaccine-related decision-making and to develop tools for real-time misinformation detection.We employed a mixed-methods approach, analyzing 1,300 Alaska-specific tweets and conducting 87 semi-structured interviews across urban and rural communities. A Retrieval-Augmented Generation (RAG) system was developed, integrating the context-rich LLaMA-2-7B model with the efficient T5-Base model to balance accuracy and computational performance. The system used sentence embeddings and FAISS-based similarity search to identify misinformation themes and generate context-aware responses grounded in real-world data.Sentiment analysis revealed that rural social media posts exhibited significantly higher negativity and misinformation (55.6% negative sentiment) compared to urban posts. In contrast, interview data reflected more balanced and nuanced attitudes toward vaccination. Thematic analysis identified systemic distrust and personal beliefs, particularly among Indigenous and rural populations, as key drivers of hesitancy. Model evaluation showed that LLaMA-2-7B outperformed T5-Base in contextual accuracy, while T5-Base offered faster but occasionally less accurate responses.By combining AI-driven insights with ethnographic data, this study highlights the divergence between online narratives and lived experiences. The proposed framework offers a scalable, real-time method for detecting misinformation and informing culturally responsive public health messaging. Future work will focus on optimizing system efficiency and collaborating with digital platforms to reduce the spread of viral misinformation.