A Survey on LLM-based Multi-Agent AI Hospital

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

AI hospitals are workflow-level multi-agent systems built on large language models that run inside clinical processes. Agents take explicit roles, maintain shared state through handoffs, use EHR- and guideline-grounded tools, and operate under safety gateways with audit logs. Prior work is rich but fragmented across tasks and settings. This survey defines the scope and boundaries of AI hospitals and compiles designs into a compact taxonomy with head-to-head trade-off matrices. We introduce a layered evaluation stack that measures safety, clinical processes, outcomes, and operations (e.g., time-to-disposition, throughput, and token/latency costs), and we use Integration Readiness Levels (IRL1--IRL6) to gate autonomy from sandbox to deployment, with required logs and pass criteria. To make deployment claims testable, we map key integration tasks to minimal instrumentation and formulate several challenges as workflow-failure mechanisms with concrete tests and IRL gates. We close with a practical roadmap on workflow-aware memory, queue-aware planning, escalation learning, traceability, and playbook adoption.

Article activity feed