STELLA: Towards a Biomedical World Model with Self-Evolving Multimodal Agents
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The staggering complexity of modern biomedical research has intensified the aspiration for a generalist “Biomedical World Model”, yet current AI agents remain constrained by static capabilities and a lack of self-evolution mechanisms. To bridge this gap, we present STELLA, a self-evolving multimodal agent designed to progressively refine its computational reasoning and physical execution through interaction. STELLA operates via a collaborative multi-agent framework (comprising Manager, Developer, Critic, Critic, and Tool Creation agents) that continuously updates reasoning templates and autonomously expands a dynamic “Tool Ocean”. We demonstrate STELLA’s capabilities on the created Tool Creation Benchmark, where it attains a score of 4.01/5 with 100% task completion, significantly outperforming state-of-the-art models including GPT-5, Claude 4 Opus, and Biomni. Beyond computational metrics, STELLA drives experimentally validated scientific discovery. In oncology, the agent identified Butyrophilin Subfamily 3 Member A1 (BTN3A1) as a novel negative regulator of natural killer (NK) cell function in acute myeloid leukemia (AML), verified via CRISPR knockout studies. In protein engineering, STELLA orchestrated a complete directed evolution workflow for the enzyme strictosidine synthase, identifying variants, notably M276L, exhibiting more than a two-fold improvement in catalytic activity. Finally, the system extends to physical laboratory automation by training Vision-Language-Action (VLA) models through a Decompose-Monitor-Recover mechanism, which increased success rates from 17% to 82%. By integrating autonomous tool evolution, biological discovery, and robotic control, STELLA offers a blueprint for a self-evolving world model in the life sciences.