Evaluation and Enhancement of Large Language Models for In-Patient Diagnostic Support

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In-patient diagnosis demands complex clinical decision-making based on comprehensive patient information, posing critical challenges for clinicians. Despite advancements in large language models (LLMs) in medical applications, limited research focused on artificial intelligence (AI) in-patient diagnostic support systems, due to the lack of large-scale patient datasets. Moreover, existing medical benchmarks typically concentrated on medical question-answering and examinations, ignoring the multifaceted nature of clinical decision-making in in-patient settings. To address these gaps, we first developed the In-Patient Diagnostic Support (IPDS) benchmark from the MIMIC-IV database, encompassing 51,274 cases across nine departments and 17 major disease categories alongside 16 standardized treatment pathways. Then, we proposed the Multi-Agent In-patient Diagnostic Support (MAIDS) framework, including a triage agent managing the patient admission flow, a clinician agent for each department serving as the primary decision maker, and a chief agent for each department overseeing disease assessments and treatment pathways. Experiments in disease assessment accuracies showed MAIDS improved by 20.70% compared to the state-of-the-art LLM. We also found MAIDS demonstrated significant clinical compliance, outperforming three board-certified clinicians by 2%-4%, establishing a foundation for in-patient diagnostic support systems.

Article activity feed