LLMs Can Do Medical Harm: Stress-Testing Clinical Decisions Under Social Pressure

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Large language models (LLMs) are entering clinical workflows, yet their effect on clinical decisions and potential for harm are uncertain.

Methods

We measured harmful decision output from an ensemble of 20 LLMs across >10 million clinical scenarios with safety or ethical dilemmas. Each case was shown under a neutral control and six Milgram-style social-pressure conditions, with or without a brief mitigation cue (“verify or escalate if unsafe”). The primary outcome was the proportion of potentially harmful responses. We used two-proportion tests/χ 2 tests and confirmatory mixed-effects logistic models.

Results

Across all runs (N = 10,096,800), LLMs produced 1.18 million potentially harmful outputs (11.7%). Mitigation reduced harmful decisions from 16.6% to 10.1% (p < 0.001). When exposed to social pressure, models behaved predictably but unevenly: prompts framed as authority or responsibility transfer generated the most harmful responses, whereas control prompts, neutral and pressure-free, produced the fewest (mitigated 8.3–9.6%; unmitigated 14.3–16.0%; χ 2 p < 0.001). In other words, when told what to do, or told that someone else would take responsibility, models were more likely to comply, even when the instruction was unsafe. These effects were consistent across datasets and models

Conclusion

LLMs can generate harmful medical decisions at scale. A brief safety reminder reduces, but does not eliminate, this behavior. These results highlight the need to measure harm propensity as a core performance metric and to maintain guardrails and continuous physician oversight before integrating LLMs into clinical decision-making.

Article activity feed