VITRUVIUS: A conversational agent for real-time evidence based medical question answering
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
The application of Large Language Models (LLMs) to create conversational agents (CAs) that can aid health professionals in their daily practice is increasingly popular, mainly due to their ability to understand and communicate in natural language. Conversational agents can manage enormous amounts of information, comprehend and reason with clinical questions, extract information from reliable sources and produce accurate answers to queries. This presents an opportunity for better access to updated and trustworthy clinical information in response to medical queries.
Objective
We present the design and initial evaluation of Vitruvius, an agent specialized in answering queries in healthcare knowledge and evidence-based medical research.
Methodology
The model is based on a system containing 5 LLMs; each is instructed with precise tasks that allow the algorithms to automatically determine the best search strategy to provide an evidence-based answer. We assessed our system’s comprehension, reasoning, and retrieval capabilities using the public clinical question-answer dataset MedQA-USMLE. The model was improved accordingly, and three versions were manufactured.
Results
We present the performance assessment for the three versions of Vitruvius, using a subset of 288 QA (Accuracy V1 86%, V2 90%, V3 93%) and the complete dataset of 1273 QA (Accuracy V2 85%, V3 90.3%). We also evaluate intra-inter-class variability and agreement. The final version of Vitruvius (V3) obtained a Cohen’s kappa of 87% and a state-of-the-art (SoTA) performance of 90.26%, surpassing current SoTAs for other LLMs using the same database.
Conclusions
Vitruvius demonstrates excellent performance in medical QA compared to standard database responses and other popular LLMs. Future investigations will focus on testing the model in a real-world clinical environment. While it enhances productivity and aids healthcare professionals, it should not be utilized by individuals unqualified to reason with medical data to ensure that critical decision-making remains in the hands of trained professionals.