EviLedger: governing clinical AI with a verifiable evidence ledger

Rui Li
Shuang Cao
Ruihua Liu
Alexandre Duprey

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Clinical evidence evolves, yet most AI systems cannot reconstruct the evidence state supporting a past decision. We introduce EviLedger, an evidence ledger that converts guidelines, drug labels, and EHR events into immutable assertions linked to hashes and rollback. On 2,000 guideline-change events and 1,200 cross-source contradiction cases (Cohen’s κ = 0.87), EviLedger achieves drift F1 94.2% and contradiction F1 94.0%. A blinded semantic audit finds that 93.7% of extracted assertions are semantically supported by their evidence spans (κ = 0.84), and external validation on ESC/JCS guidelines yields F1 >90%. As an auditable memory layer for LLM-based retrieval, EviLedger reduces stale citations from 14.7% to 1.2% and unverifiable citations from 28.4% to 2.3%, while supporting p95 rollback in 5.13 s at 78M assertions. In a 6-month hospital pilot, EviLedger detects 5.5× more actionable guideline changes with 104× faster triage than manual surveillance.

Version published to 10.21203/rs.3.rs-8769058/v1 on Research Square
Feb 20, 2026

When Agentic LLMs Trust Poisoned Tools: Vulnerability of Clinical LLMs to Adversarial Guidelines

This article has 5 authors:
1. Mahmud Omar
2. Alon Gorenshtien
3. Yiftach Barash
4. Girish Nadkarni
5. Eyal Klang
This article has no evaluationsLatest version Feb 18, 2026
Large Language Models in Infectious Diseases: A Systemic Review

This article has 7 authors:
1. Alon Gorenshtein
2. Eyal Klang
3. Jacob J. Smith
4. Richard Dzeng
5. Mark C. Poznansky
6. Girish N Nadkarni
7. Mahmud Omar
This article has no evaluationsLatest version Feb 18, 2026
Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models

This article has 22 authors:
1. Jiazhen Pan
2. Bailiang Jian
3. Paul Hager
4. Yundi Zhang
5. Che Liu
6. Friederike Jungmann
7. Hongwei Li
8. Chenyu You
9. Junde Wu
10. Jiayuan Zhu
11. Fenglin Liu
12. Yuyuan Liu
13. Niklas Bubeck
14. Christian Wachinger
15. Chen Chen
16. Zhenyu Gong
17. Cheng Ouyang
18. Georgios Kaissis
19. Benedikt Wiestler
20. Daniel Rückert
21. Julian Canisius
22. Moritz Knolle
This article has no evaluationsLatest version Feb 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

When Agentic LLMs Trust Poisoned Tools: Vulnerability of Clinical LLMs to Adversarial Guidelines

Large Language Models in Infectious Diseases: A Systemic Review

Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models