OpenEvidence Clinical Question-Answering Platform: Systematic Review of Early Evaluations

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background OpenEvidence answers clinical questions using retrieval augmented generation on curated sources with explicit citations. The company reports that over 40% of U.S. physicians (~ 400,000) consult it daily. Although several studies have evaluated the platform, the evidence base remains heterogeneous. Objective To systematically review studies evaluating OpenEvidence for clinical question answering and decision support. Methods A systematic search of MEDLINE/PubMed, Scopus, Web of Science, and Google Scholar was conducted from database inception to January 2026. Peer-reviewed studies evaluating OpenEvidence in clinical or clinically simulated contexts were included. Study selection, data extraction, and risk-of-bias assessment were performed independently by two reviewers in accordance with PRISMA guidelines. Results Eleven studies published between 2024 and 2026 were included in the analysis. OpenEvidence was evaluated as the primary platform in eight studies and as a comparator in three. OpenEvidence demonstrated the ability to generate evidence-supported responses and avoided fabricated citations. Performance was strongest in structured, guideline-based contexts. However, accuracy varied in complex clinical scenarios, and the platform often reinforced rather than altered clinical decisions. Limitations included dependence on the available retrieval context, interpretive errors despite accurate citations, variability across clinical domains, and the fact that OpenEvidence is a continuously updated system, limiting the generalizability and cross‑study comparability of point‑in‑time evaluations. Conclusions The rapid adoption of OpenEvidence among clinicians outpaces the sparse research available. With just 11 studies, limited in scope or confined to niche clinical domains, the evidence is still thin relative to its purported usage in daily medical practice. Adoption of such platforms demands prospective real-world studies with clinical endpoints and ongoing benchmarking.

Article activity feed