Obedience to Unsafe Clinical Instructions: How Large Language Models Respond to Authority Cues
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Large language models (LLMs) are being integrated into clinical environments where deference to authority can cause harm. Unlike hallucination or bias, obedience to unsafe instructions represents a distinct safety failure: following an explicit but harmful order. Methods We conducted a cross-sectional evaluation of 20 proprietary, open-source, and clinically tuned LLMs across 10,096,800 clinical decision scenarios, including synthetic vignettes with predefined safe versus unsafe options and real-world discharge recommendations reframed to include unsafe contradictory requests. Each scenario was presented under a neutral control or one of six Milgram-style social-pressure conditions (authority, responsibility transfer, urgency, threat, conformity, depersonalization), with or without a short mitigation cue instructing verification or escalation if unsafe. The primary outcome was the proportion of potentially harmful outputs, defined as selection or endorsement of an unsafe clinical decision. Results Across all runs, 1.18 million of 10.1 million outputs (11.7%) were harmful. Harmful decisions occurred in 16.6% of unmitigated versus 10.1% of mitigated conditions (absolute reduction, 6.5 percentage points; p < 0.001). In synthetic vignettes, harmful responses averaged 8.1% overall, declining from 10.6% to 7.2% with mitigation (difference, 3.4 percentage points; p < 0.001). In real-world discharge cases, harmful responses averaged 30.0%, decreasing from 46.6% to 24.5% with mitigation (difference, 22.1 percentage points; p < 0.001). Across all conditions, authority and responsibility-transfer cues elicited the highest harmful compliance, and control prompts the lowest; mitigation reduced rates but preserved this pattern. Conclusion LLMs do not behave as neutral calculators in clinical contexts. When exposed to authority or responsibility-transfer cues, they exhibit consistent obedience to unsafe instructions. A brief safety reminder substantially reduces but does not eliminate this behavior.