Evidence-Graded Decision Authorization for Safe Clinical AI: A Constrained Reasoning Framework

Che Lin
Jia-Yi Lin
Yao-San Lin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Clinical AI systems have achieved strong predictive performance; however, prediction accuracy is not sufficient for clinical safety. Retrieval-augmented generation (RAG) improves factual accuracy, and general-purpose LLM guardrails constrain surface-level output safety, but these mechanisms do not govern the inferential gap between available clinical evidence and permissible clinical claims. We propose Evidence-Graded Decision Authorization (EGDA), a framework that separates evidence extraction, sufficiency assessment, and claim-level authorization through domain-specific rules. In a controlled experiment using 60 breast cancer decision-snapshot cases (1,260 system outputs across three arms evaluated by LLM-as-Judge with expert calibration), EGDA reduced the unjustified inference rate to 8.0% (vs. 48.7% for unconstrained LLM and 47.7% for RAG; risk difference vs. unconstrained −40.7%, 95% CI −46.9 to −34.0, p < 0.001), raised the appropriate refusal rate to 95.0% (vs. 56.9% and 56.9%; risk difference vs. unconstrained +38.1%, 95% CI +31.3 to +44.5, p < 0.001), and achieved the highest factual correctness at 96.4% (vs. 69.8% and 74.5%). An unexpected finding was that retrieval-augmented generation without an authorization gate failed to reduce unjustified inference relative to the unconstrained baseline (47.7% vs. 48.7%, p = 0.870) and produced no improvement in appropriate refusal (56.9% vs. 56.9%, p = 1.0), showing that information supply alone is not sufficient for inferential governance. We argue that domain-specific, evidence-graded reasoning governance should serve as a deployment reference standard for safety-critical clinical AI.

Highlights

Evidence-graded authorization is formalized for safer LLM clinical decisions
Three behavioral safety metrics gauge inferential governance, not accuracy
RAG-only systems do not reduce unjustified inference vs unconstrained LLM
Proposed framework cuts unjustified inference rate from 48.7% to 8.0%
Robustness confirmed across two model versions and via component ablation

Version published to 10.64898/2026.05.19.26353565 on medRxiv
May 22, 2026

MedAgent: A Retrieval-Augmented Clinical Decision Support Agent with Verifiable Evidence Grounding for Evidence-Based Medicine

This article has 3 authors:
1. Fuqiang Wang
2. Zhicai Guo
3. Zhikang Ye
This article has no evaluationsLatest version Jun 17, 2026
Interpretable Predictive Modeling for Medical Data Using Boolean Rule-aware Regression

This article has 2 authors:
1. Mohammad Eskandarian
2. Seyed Amir Malekpour
This article has no evaluationsLatest version May 18, 2026
Autonomous generation of decision-grade clinical evidence

This article has 5 authors:
1. Shaopeng Yang
2. Jiayu Wu
3. Hao Xie
4. Zhuoyao Xin
5. Wei Wang
This article has no evaluationsLatest version Jul 1, 2026

Discuss this preprint

Listed in

Abstract

Highlights

Article activity feed

Related articles

MedAgent: A Retrieval-Augmented Clinical Decision Support Agent with Verifiable Evidence Grounding for Evidence-Based Medicine

Interpretable Predictive Modeling for Medical Data Using Boolean Rule-aware Regression

Autonomous generation of decision-grade clinical evidence