When Agentic LLMs Trust Poisoned Tools: Vulnerability of Clinical LLMs to Adversarial Guidelines

Mahmud Omar
Alon Gorenshtien
Yiftach Barash
Girish Nadkarni
Eyal Klang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Agentic large language models (LLMs) increasingly rely on retrieved sources and tools, but their ability to reject these tools which undergo adversarial modification is uncertain. We evaluated 21 LLMs on 500 physician-validated emergency department and inpatient vignettes across 12 medical domains. For each vignette, models chose between an authentic guideline excerpt and a sham version with one adversarial modification, presented in random order (10,500 agentic decisions). Models selected the sham in 40.6% of evaluations (59.4% accuracy), with the highest failure rates for safety-critical changes including removed warnings, deleted allergy information, contraindication violations and dosing errors (54.2% to 61.7% failure). Choices were dominated by presentation bias: models favored the first option in 72.7% of decisions, shifting accuracy from 36.7% to 82.3% depending on sham position. Guideline selection in agentic systems is therefore vulnerable to poisoned sources and may require independent verification and ranking safeguards before clinical deployment. This finding is important especially in low-resource environments relying on AI agents as primary public health gatekeepers face disproportionate risks from poisoned tools

Version published to 10.21203/rs.3.rs-8872967/v1 on Research Square
Feb 18, 2026

EviLedger: governing clinical AI with a verifiable evidence ledger

This article has 4 authors:
1. Rui Li
2. Shuang Cao
3. Ruihua Liu
4. Alexandre Duprey
This article has no evaluationsLatest version Feb 20, 2026
Obedience to Unsafe Clinical Instructions: How Large Language Models Respond to Authority Cues

This article has 9 authors:
1. Mahmud Omar
2. Reem Agbareia
3. Jolion McGreevy
4. Alon Gorenshtein
5. Alexander Charney
6. Ankit Sakhuja
7. Benjamin S. Glicksberg
8. Girish Nadkarni
9. Eyal Klang
This article has no evaluationsLatest version Mar 18, 2026
Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models

This article has 22 authors:
1. Jiazhen Pan
2. Bailiang Jian
3. Paul Hager
4. Yundi Zhang
5. Che Liu
6. Friederike Jungmann
7. Hongwei Li
8. Chenyu You
9. Junde Wu
10. Jiayuan Zhu
11. Fenglin Liu
12. Yuyuan Liu
13. Niklas Bubeck
14. Christian Wachinger
15. Chen Chen
16. Zhenyu Gong
17. Cheng Ouyang
18. Georgios Kaissis
19. Benedikt Wiestler
20. Daniel Rückert
21. Julian Canisius
22. Moritz Knolle
This article has no evaluationsLatest version Feb 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

EviLedger: governing clinical AI with a verifiable evidence ledger

Obedience to Unsafe Clinical Instructions: How Large Language Models Respond to Authority Cues

Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models