The impact of large language models on diagnostic reasoning among LLM-trained physicians: a randomized clinical trial

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large language models (LLMs) hold promise for improving clinical decision-making but generate inaccurate information, making adequate physician training on LLM capabilities, limitations, and appropriate use essential before clinical deployment. However, it remains unknown whether such training translates into improved diagnostic reasoning when physicians have LLM access. We conducted a single-blind randomized controlled trial involving 60 licensed physicians trained in Artificial Intelligence (AI), including specific training on LLMs, from multiple medical institutions in Pakistan between December 17, 2024, and May 17, 2025, utilizing both remote and in-person sessions. Post-training, participants were randomized to either access the LLM in addition to conventional resources or conventional resources only, with 75 minutes allocated to review up to 6 clinical vignettes. The primary outcome was a diagnostic reasoning score (percentage) from a validated, expert-graded rubric assessing differential diagnosis, supporting/opposing factor appropriateness, and next steps, while the secondary outcome was time per vignette (seconds). Of the 58 physicians who completed the study, LLM users achieved a mean diagnostic reasoning score of 71.4% per case compared to 42.6% with conventional resources, resulting in an adjusted difference of 27.5 percentage points (95% CI, 22.8 to 32.2 pp; P < 0.001). Mean time per case was 603.8 seconds for LLM users compared with 635 seconds for the conventional resources group, with an adjusted difference of -6.4 seconds (95% CI, -68.2 to 55.3; P = 0.84). Notably, LLM alone outperformed the treatment group by 11.5 percentage points (95% CI, 5.5 to 17.5; P < 0.001). This trial demonstrates that physicians trained to use an LLM and able to consult it significantly outperformed peers using conventional resources, underscoring the critical need for AI education to enable effective physician-AI collaboration in clinical practice (ClinicalTrials.gov: NCT06774612 ).

Article activity feed