When Agentic LLMs Trust Poisoned Tools: Vulnerability of Clinical LLMs to Adversarial Guidelines
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Agentic large language models (LLMs) increasingly rely on retrieved sources and tools, but their ability to reject these tools which undergo adversarial modification is uncertain. We evaluated 21 LLMs on 500 physician-validated emergency department and inpatient vignettes across 12 medical domains. For each vignette, models chose between an authentic guideline excerpt and a sham version with one adversarial modification, presented in random order (10,500 agentic decisions). Models selected the sham in 40.6% of evaluations (59.4% accuracy), with the highest failure rates for safety-critical changes including removed warnings, deleted allergy information, contraindication violations and dosing errors (54.2% to 61.7% failure). Choices were dominated by presentation bias: models favored the first option in 72.7% of decisions, shifting accuracy from 36.7% to 82.3% depending on sham position. Guideline selection in agentic systems is therefore vulnerable to poisoned sources and may require independent verification and ranking safeguards before clinical deployment. This finding is important especially in low-resource environments relying on AI agents as primary public health gatekeepers face disproportionate risks from poisoned tools