VITRUVIUS: A conversational agent for real-time evidence based medical question answering

Maria Camila Villa
Isabella Llano
Natalia Castano-Villegas
Julian Martinez
Maria Fernanda Guevara
Jose Zea
Laura Velásquez

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

The application of Large Language Models (LLMs) to create conversational agents (CAs) that can aid health professionals in their daily practice is increasingly popular, mainly due to their ability to understand and communicate in natural language. Conversational agents can manage enormous amounts of information, comprehend and reason with clinical questions, extract information from reliable sources and produce accurate answers to queries. This presents an opportunity for better access to updated and trustworthy clinical information in response to medical queries.

Objective

We present the design and initial evaluation of Vitruvius, an agent specialized in answering queries in healthcare knowledge and evidence-based medical research.

Methodology

The model is based on a system containing 5 LLMs; each is instructed with precise tasks that allow the algorithms to automatically determine the best search strategy to provide an evidence-based answer. We assessed our system’s comprehension, reasoning, and retrieval capabilities using the public clinical question-answer dataset MedQA-USMLE. The model was improved accordingly, and three versions were manufactured.

Results

We present the performance assessment for the three versions of Vitruvius, using a subset of 288 QA (Accuracy V1 86%, V2 90%, V3 93%) and the complete dataset of 1273 QA (Accuracy V2 85%, V3 90.3%). We also evaluate intra-inter-class variability and agreement. The final version of Vitruvius (V3) obtained a Cohen’s kappa of 87% and a state-of-the-art (SoTA) performance of 90.26%, surpassing current SoTAs for other LLMs using the same database.

Conclusions

Vitruvius demonstrates excellent performance in medical QA compared to standard database responses and other popular LLMs. Future investigations will focus on testing the model in a real-world clinical environment. While it enhances productivity and aids healthcare professionals, it should not be utilized by individuals unqualified to reason with medical data to ensure that critical decision-making remains in the hands of trained professionals.

Version published to 10.1101/2024.10.03.24314861v1 on medRxiv
Oct 4, 2024

An Active Inference Strategy for Prompting Reliable Responses from Large Language Models in Medical Practice

This article has 5 authors:
1. Roman Shusterman
2. Allison Waters
3. Shannon O’Neill
4. Phan Luu
5. Don Tucker
This article has no evaluationsLatest version Oct 21, 2024
Improved precision oncology question-answering using agentic LLM

This article has 19 authors:
1. Rangan Das
2. K Maheswari
3. Shaheen Siddiqui
4. Nikita Arora
5. Ankush Paul
6. Jeet Nanshi
7. Varun Udbalkar
8. Apoorva Sarvade
9. Harsha Chaturvedi
10. Tammy Shvartsman
11. Shet Masih
12. R Thippeswamy
13. Shekar Patil
14. S S Nirni
15. Brian Garsson
16. Sanghamitra Bandyopadhyay
17. Ujjwal Maulik
18. Mohammed Farooq
19. Debarka Sengupta
This article has no evaluationsLatest version Oct 5, 2024
Beyond Text Generation: Assessing Large Language Models' Ability to Follow Rules and Reason Logically

This article has 5 authors:
1. Zhiyong Han
2. Fortunato Battaglia
3. Kush Mansuria
4. Yoav Heyman
5. Stanley R. Terlecky
This article has no evaluationsLatest version Oct 9, 2024

Listed in

Abstract

Background

Objective

Methodology

Results

Conclusions

Article activity feed

Related articles

An Active Inference Strategy for Prompting Reliable Responses from Large Language Models in Medical Practice

Improved precision oncology question-answering using agentic LLM

Beyond Text Generation: Assessing Large Language Models' Ability to Follow Rules and Reason Logically