Agentic AI in Healthcare: Bridging the Gap Between Computational Promise and Clinical Evidence

Yunguo Yu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Agentic AI systems are increasingly proposed for healthcare applications, yet the evidence base distinguishing computational promise from clinical reality remains poorly characterised. Single-agent systems offer efficiency for routine diagnostics; multi-agent systems promise robustness for complex care. Both face barriers in safety, accountability, and equitable deployment. Methods: We conducted a PRISMA-ScR scoping review synthesising evidence from 161 studies (January 2018–October 2024, with selective early-access coverage through April 2026) retrieved from PubMed, IEEE Xplore, arXiv, Google Scholar, and Scopus. Evidence certainty was graded using an adapted GRADEinformed framework appropriate for heterogeneous clinical and simulation evidence. Given substantial heterogeneity across architectures, tasks, and outcome measures, quantitative pooling was not appropriate; we employed structured evidence mapping and narrative synthesis. A pragmatic, deployment-focused definition of “agent” was adopted and extended with a five-level Agentic Capability Spectrum (Levels 0–4) to preserve discriminative power. Results: High-certainty evidence supports selected single-agent systems in specialised diagnostic domains (94.5% accuracy in retinal screening; AUC 0.96 in skin-cancer classification). Very low-certainty evidence from simulation studies suggests potential coordination advantages for multi-agent systems, with no confirmed clinical deployment. Multi-agent systems require substantially higher computational resources and introduce coordination latency (200–500 ms in simulation). Cross-cutting barriers include algorithmic bias in one commercial population-health algorithm (Moderate-certainty; Obermeyer et al., 2019; generalisability uncertain), unclear liability frameworks, and workflow-integration failures. Evidence is predominantly from high-income countries (87% of studies; descriptive evidence-mapping finding). Conclusions: Single-agent systems demonstrate validated clinical utility in constrained tasks, whereas multiagent systems remain experimental. Priorities include large-scale clinical trials for multi-agent architectures, standardised safety frameworks, risk-based regulatory pathways, and equity-focused global deployment strategies. As this synthesis was conducted by a single reviewer, all findings represent a preliminary structured synthesis requiring independent replication before informing clinical guideline development.

Version published to 10.21203/rs.3.rs-9374197/v1 on Research Square
Apr 14, 2026

MedAgent: A Retrieval-Augmented Clinical Decision Support Agent with Verifiable Evidence Grounding for Evidence-Based Medicine

This article has 3 authors:
1. Fuqiang Wang
2. Zhicai Guo
3. Zhikang Ye
This article has no evaluationsLatest version Jun 17, 2026
An Agent-Based Modeling Framework for Healthcare AI Adoption: Application to Ambient Clinical Documentation

This article has 1 author:
1. Matthew G Crowson
This article has no evaluationsLatest version Jul 2, 2026
PHO-Agents: A Large Language Model–Powered Multi-Agent System for Predicting Health Outcomes

This article has 6 authors:
1. Daling Shi
2. Tyler Shugg
3. Michael T. Eadon
4. Jing Su
5. Yijiang Chen
6. Qianqian Song
This article has no evaluationsLatest version Jun 29, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

MedAgent: A Retrieval-Augmented Clinical Decision Support Agent with Verifiable Evidence Grounding for Evidence-Based Medicine

An Agent-Based Modeling Framework for Healthcare AI Adoption: Application to Ambient Clinical Documentation

PHO-Agents: A Large Language Model–Powered Multi-Agent System for Predicting Health Outcomes