LLMs Can Do Medical Harm: Stress-Testing Clinical Decisions Under Social Pressure
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Large language models (LLMs) are entering clinical workflows, yet their effect on clinical decisions and potential for harm are uncertain.
Methods
We measured harmful decision output from an ensemble of 20 LLMs across >10 million clinical scenarios with safety or ethical dilemmas. Each case was shown under a neutral control and six Milgram-style social-pressure conditions, with or without a brief mitigation cue (“verify or escalate if unsafe”). The primary outcome was the proportion of potentially harmful responses. We used two-proportion tests/χ 2 tests and confirmatory mixed-effects logistic models.
Results
Across all runs (N = 10,096,800), LLMs produced 1.18 million potentially harmful outputs (11.7%). Mitigation reduced harmful decisions from 16.6% to 10.1% (p < 0.001). When exposed to social pressure, models behaved predictably but unevenly: prompts framed as authority or responsibility transfer generated the most harmful responses, whereas control prompts, neutral and pressure-free, produced the fewest (mitigated 8.3–9.6%; unmitigated 14.3–16.0%; χ 2 p < 0.001). In other words, when told what to do, or told that someone else would take responsibility, models were more likely to comply, even when the instruction was unsafe. These effects were consistent across datasets and models
Conclusion
LLMs can generate harmful medical decisions at scale. A brief safety reminder reduces, but does not eliminate, this behavior. These results highlight the need to measure harm propensity as a core performance metric and to maintain guardrails and continuous physician oversight before integrating LLMs into clinical decision-making.