AI-literacy training enhances physician-LLM diagnostic collaboration in a resource-limited setting: a randomized controlled trial

Ihsan Ayyub Qazi
Ayesha Ali
Asad Ullah Khawaja
Muhammad Junaid Akhtar
Ali Zafar Sheikh
Muhammad Hamad Alizai

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Diagnostic errors remain a pervasive yet preventable source of patient harm, with resourcelimited healthcare systems in low- and middle-income countries (LMICs) facing disproportionately higher diagnostic error rates due to limited access to diagnostic tools, specialists, and decision support systems. Large language models (LLMs) offer potential to bridge diagnostic gaps in these settings but can generate inaccurate information, making comprehensive AI-literacy training for physicians essential before deployment. However, whether structured AI-literacy training translates into improved diagnostic reasoning remains unknown. We conducted a single-blind randomized controlled trial involving 60 licensed physicians from multiple medical institutions in Pakistan, a LMIC, between January 10, 2025, and May 17, 2025. Participants completed a novel 20-hour AI-literacy curriculum covering LLM capabilities, limitations, and appropriate use. Post-training, physicians were randomized to either LLM access plus conventional resources or conventional resources only, with 75 minutes allocated to review up to 6 clinical vignettes. The primary outcome was diagnostic reasoning score (percentage) from a validated, expert-graded rubric assessing differential diagnosis, supporting/opposing factor appropriateness, and next steps; secondary outcome was time per vignette (seconds). Of 58 physicians completing the study, those with LLM access achieved mean diagnostic reasoning scores of 71.4% versus 42.6% with conventional medical resources alone, yielding an adjusted difference of 27.5 percentage points (95% CI, 22.8 to 32.2; P < 0.001). Mean time per case was similar between groups (603.8 vs. 635 seconds; adjusted difference -6.4 seconds, 95% CI -68.2 to 55.3; P = 0.84). While LLM alone outperformed the trained physician group by 11.5 percentage points (95% CI, 5.5 to 17.5; P < 0.001), in 38% of cases the physician plus LLM group surpassed the median LLM-alone performance, highlighting physician-AI complementarities. This trial suggests that AI-literacy training can enable physicians in resourcelimited settings to effectively leverage LLMs for enhanced diagnostic reasoning to address diagnostic gaps in LMICs ( ClinicalTrials.gov : NCT06774612 ).

Version published to 10.1101/2025.06.06.25329104 on medRxiv
Jun 6, 2025

Large Language Models for the assessment of medical students’ clinical decision-making

This article has 5 authors:
1. Sina Chole Benker
2. Jonathan Vollprecht
3. Cihan Papan
4. Max Hao Lu
5. Dogus Darici
This article has no evaluationsLatest version Jun 17, 2025
Integrating Expert Knowledge into Large Language Models Improves Performance for Psychiatric Reasoning and Diagnosis

This article has 7 authors:
1. Karthik V Sarma
2. Kaitlin E Hanss
3. Andrew J M Halls
4. Andrew Krystal
5. Daniel F Becker
6. Anne L Glowinski
7. Atul J Butte
This article has no evaluationsLatest version Jul 21, 2025
An interdisciplinary, randomized, single-blind evaluation of state-of-the-art large language models for their implications and risks in medical diagnosis and management

This article has 43 authors:
1. Peikai Chen
2. Jifu Cai
3. Jiaying Zhou
4. Shaoxi Chen
5. Chenguang Xu
6. Lihua Yuan
7. Xiaoying Dai
8. Xiaowei Chen
9. Yanzhe Wei
10. Xia Li
11. Shaofeng Gong
12. Xiaolong Liang
13. Jiancheng Yang
14. Jun Jin
15. Kanglin Dai
16. Yuzhen Cui
17. Guan-Ming Kuang
18. Jianshen Xie
19. Libing Luo
20. Haibing Xiao
21. Shijie Yin
22. Jun Yang
23. Yulan Yan
24. Jianliang Chen
25. Yihua Chen
26. Qianshen Zhang
27. Qingshan Zhou
28. Lina Zhao
29. Min Wu
30. Xin Tang
31. Lei Rong
32. Zanxin Wang
33. Weifu Qiu
34. Yanli Wang
35. Liwen Cui
36. Xiangyang Li
37. Yong Hu
38. Huiren Tao
39. Nan Wu
40. Pearl Pai
41. Minxin Wei
42. Michael Kai-tsun To
43. Kenneth M.C. Cheung
This article has no evaluationsLatest version Jun 24, 2025

Listed in

Abstract

Article activity feed

Related articles

Large Language Models for the assessment of medical students’ clinical decision-making

Integrating Expert Knowledge into Large Language Models Improves Performance for Psychiatric Reasoning and Diagnosis

An interdisciplinary, randomized, single-blind evaluation of state-of-the-art large language models for their implications and risks in medical diagnosis and management