RiskAgent: Autonomous Medical AI Copilot for Generalist Risk Prediction
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The application of Large Language Models (LLMs) to various clinical applications has attracted growing research attention. LLMs currently achieve competitive results compared to human experts in examinations. However, real-world clinical decisionmaking differs significantly from the standardized, exam-style scenarios commonly used in current efforts. It therefore remains a challenge to apply LLMs to complex medical tasks that require a deep understanding of medical knowledge. A common approach is to fine-tune LLMs for target tasks, which, however, not only requires substantial data and computational resources but is also still prone to generating ‘hallucinations’. In this paper, we present the RiskAgent system to perform a broad range of medical risk predictions, covering over 387 risk scenarios across diverse complex diseases, e.g., cardiovascular disease and cancer. RiskAgent is designed to collaborate with hundreds of clinical decision tools, i.e., risk calculators and scoring systems that are supported by evidence-based medicine. To evaluate our method, we have built the first benchmark MedRisk specialized for risk prediction, including 12,352 questions spanning 154 diseases, 86 symptoms, 50 specialties, and 24 organ systems. The results show that our RiskAgent, with 8 billion model parameters, achieves 76.33% accuracy, outperforming the most recent commercial LLMs, o1, o3-mini, and GPT-4.5, and doubling the 38.39% accuracy of GPT-4o. On rare diseases, e.g., Idiopathic Pulmonary Fibrosis (IPF), RiskAgent outperforms o1 and GPT-4.5 by 27.27% and 45.46% accuracy, respectively. Finally, we further conduct a generalization evaluation on an external evidence-based diagnosis benchmark and show that our RiskAgent achieves the best results. These encouraging results demonstrate the great potential of our solution for diverse diagnosis domains. For example, instead of extensively fine-tuning LLMs for different medical tasks, our method, which collaborates with and utilizes existing evidence-based medical tools, not only achieves trustworthy results but also reduces resource costs, thus making LLMs accessible to resource-limited clinical applications. To improve the adaptability of our model in different scenarios, we have built and open-sourced a family of models ranging from 1 billion to 70 billion parameters. Our code, data, and models are all available at https://github.com/AI-in-Health/RiskAgent .