Introducing Answered with Evidence - a framework for evaluating whether LLM responses to biomedical questions are founded in evidence
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The growing use of large language models (LLMs) for biomedical question answering raises concerns about the accuracy and evidentiary support of their responses. To address this, we present Answered with Evidence , a framework for evaluating whether LLM generated answers are grounded in scientific literature. We analyzed thousands of physician-submitted questions using a comparative pipeline across seven LLMs grounded in different evidence sources. Six sources were grounded in PubMed or general online content, and the last source was grounded in the Atropos Alexandria library of custom real-world analyses. We found that the general purpose LLMs grounded in public information varied greatly in the answers they returned, even when those answers were sourced from the same publication. Using an ensemble approach, we observed that 49% of the time, two or more LLMs agreed on an answer. Combined, the ensemble approach and the Alexandria custom built source enabled reliable answers to over 64% of biomedical queries. As LLMs become increasingly capable of summarizing scientific content, maximizing their value will require systems that can accurately retrieve both published and custom-generated evidence or generate reliable evidence in real time.