Orchestrated multi agents sustain accuracy under clinical-scale workloads compared to a single agent

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We tested state-of-the-art large language models (LLMs) in two configurations for clinical-scale workloads: a single agent handling heterogeneous tasks versus an orchestrated multi-agent system assigning each task to a dedicated worker. Across retrieval, extraction, and dosing calculations, we varied batch sizes from 5 to 80 to simulate clinical traffic. Multi-agent runs maintained high accuracy under load (pooled accuracy 90.6% at 5 tasks, 65.3% at 80) while single-agent accuracy fell sharply (73.1% to 16.6%), with significant differences beyond 10 tasks (FDR-adjusted p < 0.01). Multi-agent execution reduced token usage up to 65-fold and limited latency growth compared with single-agent runs. The design’s isolation of tasks prevented context interference and preserved performance across four diverse LLM checkpoints. This is the first evaluation of LLM agent architectures under sustained, mixed-task clinical workloads, showing that lightweight orchestration can deliver accuracy, efficiency, and auditability at operational scale.

Article activity feed