AI-literacy training enhances physician-LLM diagnostic collaboration in a resource-limited setting: a randomized controlled trial
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Diagnostic errors remain a pervasive yet preventable source of patient harm, with resource-limited healthcare systems in low- and middle-income countries (LMICs) facing disproportionately higher diagnostic error rates due to limited access to diagnostic tools, specialists, and decision support systems. Large language models (LLMs) offer potential to bridge diagnostic gaps in these settings but can generate inaccurate information, making comprehensive AI-literacy training for physicians essential before deployment. However, whether structured AI-literacy training translates into improved diagnostic reasoning remains unknown. We conducted a single-blind randomized controlled trial involving 60 licensed physicians from multiple medical institutions in Pakistan, a LMIC, between December 17, 2024, and May 17, 2025. Participants completed a novel 20-hour AI-literacy curriculum covering LLM capabilities, limitations, and appropriate use. Post-training, physicians were randomized to either LLM access plus conventional resources or conventional resources only, with 75 minutes allocated to review up to 6 clinical vignettes. The primary outcome was diagnostic reasoning score (percentage) from a validated, expert-graded rubric assessing differential diagnosis, supporting/opposing factor appropriateness, and next steps; secondary outcome was time per vignette (seconds). Of 58 physicians completing the study, those with LLM access achieved mean diagnostic reasoning scores of 71.4% versus 42.6% with conventional medical resources alone, yielding an adjusted difference of 27.5 percentage points (95% CI, 22.8 to 32.2; P < 0.001). Mean time per case was similar between groups (603.8 vs. 635 seconds; adjusted difference -6.4 seconds, 95% CI -68.2 to 55.3; P = 0.84). While LLM alone outperformed the trained physician group by 11.5 percentage points (95% CI, 5.5 to 17.5; P < 0.001), in 38% of cases the physician plus LLM group surpassed the median LLM-alone performance, highlighting physician-AI complementarities. This trial suggests that AI-literacy training can enable physicians in resource-limited settings to effectively leverage LLMs for enhanced diagnostic reasoning to address diagnostic gaps in LMICs (ClinicalTrials.gov: NCT06774612 ).