Med-AgentX: Multimodal Large Language Model Agents with Explainable Reinforcement Learning for Trustworthy Biomedical Decision Support
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Biomedical decision support systems face persistent challenges in integrating multimodal patient data and providing clinicians with trustworthy, transparent recommendations. While large language models (LLMs) and multimodal foundation models demonstrate remarkable reasoning across text, images, and structured records, their black-box nature and brittleness under noisy or adversarial conditions hinder deployment in safety-critical healthcare settings. We propose Med-AgentX, a multimodal LLM-agent framework enhanced with reinforcement learning (RL) and attribution-based explainability. Med-AgentX fuses clinical text, imaging, and structured data through crossmodal attention, while an RL policy optimizes for diagnostic accuracy, robustness, and explanation fidelity. Human-in-theloop feedback further aligns the system with clinical expertise. Experiments on benchmark datasets demonstrate that Med- AgentX outperforms strong baselines in predictive accuracy and robustness, while offering interpretable rationales validated by clinicians. Taken together, Med-AgentX represents a step toward next-generation biomedical AI that is powerful, accountable, and clinician-aligned.