Integrating Retrieval-Augmented Generation and Thematic NLP for Vaccine Confidence Modeling in Alaska

Luay Abdeljaber
Sultan Alsarra
Latifur Khan
Renee F. Robinson
Ana Lorena Ruano
Ubydul Haque

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Vaccine misinformation poses a significant public health threat, particularly in communities with varying levels of vaccine confidence. This study investigated vaccine hesitancy across Alaska’s diverse communities by triangulating public sentiment from social media with individual beliefs gathered through qualitative interviews. The aim was to explore how online discourse influences vaccine-related decision-making and to develop tools for real-time misinformation detection.We employed a mixed-methods approach, analyzing 1,300 Alaska-specific tweets and conducting 87 semi-structured interviews across urban and rural communities. A Retrieval-Augmented Generation (RAG) system was developed, integrating the context-rich LLaMA-2-7B model with the efficient T5-Base model to balance accuracy and computational performance. The system used sentence embeddings and FAISS-based similarity search to identify misinformation themes and generate context-aware responses grounded in real-world data.Sentiment analysis revealed that rural social media posts exhibited significantly higher negativity and misinformation (55.6% negative sentiment) compared to urban posts. In contrast, interview data reflected more balanced and nuanced attitudes toward vaccination. Thematic analysis identified systemic distrust and personal beliefs, particularly among Indigenous and rural populations, as key drivers of hesitancy. Model evaluation showed that LLaMA-2-7B outperformed T5-Base in contextual accuracy, while T5-Base offered faster but occasionally less accurate responses.By combining AI-driven insights with ethnographic data, this study highlights the divergence between online narratives and lived experiences. The proposed framework offers a scalable, real-time method for detecting misinformation and informing culturally responsive public health messaging. Future work will focus on optimizing system efficiency and collaborating with digital platforms to reduce the spread of viral misinformation.

Version published to 10.21203/rs.3.rs-7368501/v1 on Research Square
Aug 19, 2025

Integrating Explainability for Sentiment Interpretation, Misclassification, and Bias Detection in Women-in-STEM Social Media

This article has 2 authors:
1. Shereen Fouad
2. Ezzaldin Alkooheji
This article has no evaluationsLatest version Jan 12, 2026
Enhanced Language Models for Predicting and Understanding HIV Care Disengagement: A Case Study in Tanzania

This article has 17 authors:
1. Waverly Wei
2. Junzhe Shao
3. Rita Qiuran Lyu
4. Rebecca Hemono
5. Xinwei Ma
6. Joseph Giorgio
7. Zeyu Zheng
8. Feng Ji
9. Xiaoya Zhang
10. Emmanuel Katabaro
11. Matilda Mlowe
12. Amon Sabasaba
13. Caroline Lister
14. Siraji Shabani
15. Prosper Njau
16. Sandra I. McCoy
17. Jingshen Wang
This article has no evaluationsLatest version Jan 14, 2026
Exploration of Large Language Models forGeotagging of Social Media Posts

This article has 2 authors:
1. Riwaz Udas
2. Richard Sinnott
This article has no evaluationsLatest version Feb 3, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Integrating Explainability for Sentiment Interpretation, Misclassification, and Bias Detection in Women-in-STEM Social Media

Enhanced Language Models for Predicting and Understanding HIV Care Disengagement: A Case Study in Tanzania

Exploration of Large Language Models forGeotagging of Social Media Posts