VITRUVIUS: A conversational agent for real-time evidence based medical question answering

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

The application of Large Language Models (LLMs) to create conversational agents (CAs) that can aid health professionals in their daily practice is increasingly popular, mainly due to their ability to understand and communicate in natural language. Conversational agents can manage enormous amounts of information, comprehend and reason with clinical questions, extract information from reliable sources and produce accurate answers to queries. This presents an opportunity for better access to updated and trustworthy clinical information in response to medical queries.

Objective

We present the design and initial evaluation of Vitruvius, an agent specialized in answering queries in healthcare knowledge and evidence-based medical research.

Methodology

The model is based on a system containing 5 LLMs; each is instructed with precise tasks that allow the algorithms to automatically determine the best search strategy to provide an evidence-based answer. We assessed our system’s comprehension, reasoning, and retrieval capabilities using the public clinical question-answer dataset MedQA-USMLE. The model was improved accordingly, and three versions were manufactured.

Results

We present the performance assessment for the three versions of Vitruvius, using a subset of 288 QA (Accuracy V1 86%, V2 90%, V3 93%) and the complete dataset of 1273 QA (Accuracy V2 85%, V3 90.3%). We also evaluate intra-inter-class variability and agreement. The final version of Vitruvius (V3) obtained a Cohen’s kappa of 87% and a state-of-the-art (SoTA) performance of 90.26%, surpassing current SoTAs for other LLMs using the same database.

Conclusions

Vitruvius demonstrates excellent performance in medical QA compared to standard database responses and other popular LLMs. Future investigations will focus on testing the model in a real-world clinical environment. While it enhances productivity and aids healthcare professionals, it should not be utilized by individuals unqualified to reason with medical data to ensure that critical decision-making remains in the hands of trained professionals.

Article activity feed