Agentic AI in Healthcare: Bridging the Gap Between Computational Promise and Clinical Evidence

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Agentic AI systems are increasingly proposed for healthcare applications, yet the evidence base distinguishing computational promise from clinical reality remains poorly characterised. Single-agent systems offer efficiency for routine diagnostics; multi-agent systems promise robustness for complex care. Both face barriers in safety, accountability, and equitable deployment. Methods: We conducted a PRISMA-ScR scoping review synthesising evidence from 161 studies (January 2018–October 2024, with selective early-access coverage through April 2026) retrieved from PubMed, IEEE Xplore, arXiv, Google Scholar, and Scopus. Evidence certainty was graded using an adapted GRADEinformed framework appropriate for heterogeneous clinical and simulation evidence. Given substantial heterogeneity across architectures, tasks, and outcome measures, quantitative pooling was not appropriate; we employed structured evidence mapping and narrative synthesis. A pragmatic, deployment-focused definition of “agent” was adopted and extended with a five-level Agentic Capability Spectrum (Levels 0–4) to preserve discriminative power. Results: High-certainty evidence supports selected single-agent systems in specialised diagnostic domains (94.5% accuracy in retinal screening; AUC 0.96 in skin-cancer classification). Very low-certainty evidence from simulation studies suggests potential coordination advantages for multi-agent systems, with no confirmed clinical deployment. Multi-agent systems require substantially higher computational resources and introduce coordination latency (200–500 ms in simulation). Cross-cutting barriers include algorithmic bias in one commercial population-health algorithm (Moderate-certainty; Obermeyer et al., 2019; generalisability uncertain), unclear liability frameworks, and workflow-integration failures. Evidence is predominantly from high-income countries (87% of studies; descriptive evidence-mapping finding). Conclusions: Single-agent systems demonstrate validated clinical utility in constrained tasks, whereas multiagent systems remain experimental. Priorities include large-scale clinical trials for multi-agent architectures, standardised safety frameworks, risk-based regulatory pathways, and equity-focused global deployment strategies. As this synthesis was conducted by a single reviewer, all findings represent a preliminary structured synthesis requiring independent replication before informing clinical guideline development.

Article activity feed