Evaluation and Enhancement of Large Language Models for In-Patient Diagnostic Support
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In-patient diagnosis demands complex clinical decision-making based on comprehensive patient information, posing critical challenges for clinicians. Despite advancements in large language models (LLMs) in medical applications, limited research focused on artificial intelligence (AI) in-patient diagnostic support systems, due to the lack of large-scale patient datasets. Moreover, existing medical benchmarks typically concentrated on medical question-answering and examinations, ignoring the multifaceted nature of clinical decision-making in in-patient settings. To address these gaps, we first developed the In-Patient Diagnostic Support (IPDS) benchmark from the MIMIC-IV database, encompassing 51,274 cases across nine departments and 17 major disease categories alongside 16 standardized treatment pathways. Then, we proposed the Multi-Agent In-patient Diagnostic Support (MAIDS) framework, including a triage agent managing the patient admission flow, a clinician agent for each department serving as the primary decision maker, and a chief agent for each department overseeing disease assessments and treatment pathways. Experiments in disease assessment accuracies showed MAIDS improved by 20.70% compared to the state-of-the-art LLM. We also found MAIDS demonstrated significant clinical compliance, outperforming three board-certified clinicians by 2%-4%, establishing a foundation for in-patient diagnostic support systems.