Fact-Checks Can Help Inoculate LLMs Against Disinformation

Morgan Wack
Eva-Maria Vogel
Christian Pipal
Patrick Warren

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) have become the first source millions consult when evaluating political claims, yet they were trained on an open internet that state-sponsored disinformation operations have been designed to pollute. We audit four frontier models across 2,268 responses to 63 fabrications from eight documented influence operations and find that even a single published fact-check can shift a model from hedging about a fabrication to rejecting it outright. Overall, models correctly rejected only 81% of fabrications while repeating disinformation in 3% of cases. Critically, the remaining 16% of responses would have left users unable to determine whether an fabricated claims were true or false. Fact-checks may present an avenue for addressing this vulnerability. Narratives which had been debunked by at least one IFCN certified fact-checking organization were correctly rejected as disinformation 93% of the time compared to just 76% of the time for unchecked narratives. Exploratory analyses suggest that this mechanism operates through training data, with models echoing the specific vocabulary of published corrections when available during their training windows. We conclude by discussing how fact-checks designed for human audiences appear to serve as effective protection for LLMs, converting both model uncertainty and outright disinformation into definitive rejections

Version published to 10.31235/osf.io/nrs5z_v1 on OSF Preprints
Apr 3, 2026

AI Pandering: Constructing Diverging Political Realities through Conversation

This article has 4 authors:
1. James Bisbee
2. Joshua Clinton
3. Jennifer Larson
4. Diana Da In Lee
This article has no evaluationsLatest version Apr 6, 2026
Stop blaming the model: Rules, memory, and knowledge as framework for AI-assisted research

This article has 1 author:
1. Ji Ma
This article has no evaluationsLatest version Apr 2, 2026
Prebunking via Induction: Categorization Practice Reduces Misinformation Susceptibility

This article has 4 authors:
1. Benjamin Motz
2. Kendall Moore
3. Keven Gregg
4. Jon Roozenbeek
This article has no evaluationsLatest version Apr 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

AI Pandering: Constructing Diverging Political Realities through Conversation

Stop blaming the model: Rules, memory, and knowledge as framework for AI-assisted research

Prebunking via Induction: Categorization Practice Reduces Misinformation Susceptibility