Real-world Validation of MedSearch: a conversational agent for real-time, evidence-based medical question-answering
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Introduction
Application of Large Language Models (LLMs) powered Conversation Agents (CAs) in healthcare has been evaluated using medical question-answering (QA) datasets, with excellent performance in international medical licensing exams [1, 2, 3]. However, multiple-choice questions fall short when the intention is to assess more complex language interactions and open-ended responses.
Objective
to evaluate the time invested and the validity of health care personnel’s (HCP) responses to clinical questions using Med-Search compared to traditional search methods without AI.
Methods
This was a randomized, double-blind trial with 100 physicians assigned to two groups. Each group answered four clinical cases with four questions, one group using MedSearch and the other using traditional research methods such as Google and PubMed, except AI. Field specialists evaluated responses in six aspects established to define the validity of answers. Time to respond was also recorded, described, and compared between the two groups.
Results
More than 70% of the sample were medical students. Differences in results between groups were statistically significant in all evaluated aspects (p <0.01): the intervention (MedSearch) group arrived at a final answer in half the time (three minutes faster) of the control group (traditional research methods), with approximately 66% fewer searches per case. The model’s answers were valid (accurate, current, aligned with consensus, and safe) with an average score of 2.8 on a scale from 1 to 3. Most MedSearch users found it useful for daily practice and would recommend it to colleagues.
Conclusion
the present results suggest a positive impact of LLM-supported methods for a more effective clinical search, without sacrificing, and even augmenting the quality of answers. More clinical validations are needed to understand further the effect of LLMs use in education and clinical practice, using broader sample sizes and across professionals from different fields.