A Methodology Framework for Analyzing Health Misinformation to Develop Inoculation Intervention Using Large Language Models: A case study on covid-19

Samira Malek
Christopher Griffin
Robert Fraleigh
Robert P. Lennon
Vishal Monga
Lijiang Shen

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

The rapid growth of social media as an information channel has enabled the swift spread of inaccurate or false health information, significantly impacting public health. This widespread dissemination of misinformation has caused confusion, eroded trust in health authorities, led to noncompliance with health guidelines, and encouraged risky health behaviors. Understanding the dynamics of misinformation on social media is essential for devising effective public health communication strategies.

Objective

This study aims to present a comprehensive and automated approach that leverages Large Language Models (LLMs) and Machine Learning (ML) techniques to detect misinformation on social media, uncover the underlying causes and themes, and generate refutation arguments, facilitating control of its spread and promoting public health outcomes by inoculating people against health misinformation.

Methods

We use two datasets to train three LLMs, namely BERT, T5, and GPT-2, to classify documents into two categories: misinformation and non-misinformation. Additionally, we employ a separate dataset to identify misinformation topics. To analyze these topics, we apply three topic modeling algorithms—Latent Dirichlet Allocation (LDA), Top2Vec, and BERTopic—and selected the optimal model based on performance evaluated across three metrics. Using a prompting approach, we extract sentence-level representations for the topics to uncover their underlying themes. Finally, we design a prompt text capable of identifying misinformation themes effectively.

Results

The trained BERT model demonstrated exceptional performance, achieving 98% accuracy in classifying misinformation and non-misinformation, with a 44% reduction in false positive rates for AI-generated misinformation. Among the three topic modeling approaches employed, BERTopic outperformed the others, achieving the highest metrics with a Coherence Value (CV) of 0.41, Normalized Pointwise Mutual Information (NPMI) of -0.086, and Inverted RBO (IRBO) of 0.99. To address the issue of unclassified documents, we developed an algorithm to assign each document to its closest topic. Additionally, we proposed a novel method using prompt engineering to generate sentence-level representations for each topic, achieving a 99.6% approval rate as “appropriate” or “somewhat appropriate” by three independent raters. We further designed a prompt text to identify themes of misinformation topics and developed another prompt capable of detecting misinformation themes with 80% accuracy.

Conclusions

This study presents a comprehensive and automated approach to addressing health misinformation on social media using advanced machine learning and natural language processing techniques. By leveraging large language models (LLMs) and prompt engineering, the system effectively detects misinformation, identifies underlying themes, and provides explanatory responses to combat its spread.

(Journal of Medical Internet Research) doi:

Version published to 10.1101/2025.05.22.25327931v1 on medRxiv
May 23, 2025

Building an Analytical Framework for Tobacco-Related Misinformation on Social Media: An Exploratory Analysis with Generative AI Assistance

This article has 3 authors:
1. Eileen Han
2. Miao Feng
3. Pamela Ling
This article has no evaluationsLatest version May 21, 2025
Large Language Models for Sentiment Analysis in Healthcare: A Systematic Review Protocol

This article has 3 authors:
1. Ravi Shankar
2. Isabella Lee Yee
3. Xu Qian
This article has no evaluationsLatest version Jun 11, 2025
Early Detection of Mental Health Crises through Social Media Analysis

This article has 1 author:
1. Aayam Bansal
This article has no evaluationsLatest version May 5, 2025

Listed in

Abstract

Background

Objective

Methods

Results

Conclusions

Article activity feed

Related articles

Building an Analytical Framework for Tobacco-Related Misinformation on Social Media: An Exploratory Analysis with Generative AI Assistance

Large Language Models for Sentiment Analysis in Healthcare: A Systematic Review Protocol

Early Detection of Mental Health Crises through Social Media Analysis