When Helpfulness Backfires: LLMs and the Risk of Misinformation Due to Sycophantic Behavior

Shan Chen
Mingye Gao
Kuleen Sasse
Thomas Hartvigsen
Brian Anthony
Lizhou Fan
Hugo Aerts
Jack Gallifant
Danielle S. Bitterman

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) exhibit a critical vulnerability arising from being trained to be helpful: a tendency to comply with illogical requests that would generate misinformation, even when they have the knowledge to identify the request as illogical. This study investigated this vulnerability in the medical domain, evaluating five frontier LLMs using prompts that misrepresent equivalent drug relationships. We tested baseline compliance, the impact of prompts allowing rejection and emphasizing factual recall, and the effects of fine-tuning on a dataset of illogical requests, including out-of-distribution generalization. Results showed concerningly high initial compliance (up to 100%) across all models, prioritizing helpfulness over logical consistency. However, prompt engineering and fine-tuning improved performance, achieving near-perfect rejection rates on illogical requests while maintaining general benchmark performance. This demonstrates that prioritizing logical consistency through targeted training and prompting is crucial for mitigating the risk of medical misinformation and ensuring the safe deployment of LLMs in healthcare.

Version published to 10.21203/rs.3.rs-6206365/v1 on Research Square
Apr 21, 2025

Unmasking Users in IoC: LLMs, Text Profiling, and Privacy Implications

This article has 4 authors:
1. ASIMINA TSOUPLAKI
2. Christos Kalloniatis
3. George Mikros
4. Apostolis Siatras
This article has no evaluationsLatest version May 16, 2025
Countering AI-Generated Misinformation With Pre-Emptive Source Discreditation and Debunking

This article has 7 authors:
1. Emily Rebecca Spearing
2. Constantina I. Gile
3. Amy L. Fogwill
4. Toby Prike
5. Briony Swire-Thompson
6. Stephan Lewandowsky
7. Ullrich K. H. Ecker
This article has no evaluationsLatest version May 21, 2025
It Knew Too Much: On the Unsuitability of LLMs as Replacements for Human Subjects

This article has 2 authors:
1. Amaç Herdağdelen
2. Bogdan State
This article has no evaluationsLatest version May 28, 2025

Listed in

Abstract

Article activity feed

Related articles

Unmasking Users in IoC: LLMs, Text Profiling, and Privacy Implications

Countering AI-Generated Misinformation With Pre-Emptive Source Discreditation and Debunking

It Knew Too Much: On the Unsuitability of LLMs as Replacements for Human Subjects