Can LLMs Improve Healthcare Delivery? Evidence from Physician Review and Objective Testing ^*

Jason Abaluck
Robert Pless
Nirmal Ravi
Anja Sautmann
Aaron Schwartz

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We deployed a large language model (LLM) decision support using GPT4 for health workers at two outpatient clinics in Nigeria. For each patient, health workers drafted care plans that were optionally revised after LLM feedback. We compared unassisted and assisted plans using (i) blinded randomized assessments by on-site physicians who assessed and treated the same patients and (ii) results from laboratory tests for common conditions. Academic physicians performed blinded retrospective reviews of a subset of notes. Providers reported high satisfaction with LLM feedback, and retrospective academic reviewers rated LLM-assisted plans more favorably. However, on-site physicians observed little to no improvement in diagnostic alignment or treatment decisions. Objective testing showed mixed effects of LLM-assistance, with reduced overtesting for malaria but increased overtesting for urinary tract infection and anemia. This highlights a gap between chart-based reviews and real-world clinical relevance that may be especially important in evaluating the effectiveness of LLM-based interventions.

Version published to 10.1101/2025.10.31.25339278 on medRxiv
Nov 3, 2025

Benchmarking Large Language Models and Clinicians Using Locally Generated Primary Healthcare Vignettes in Kenya

This article has 11 authors:
1. Paul Mwaniki
2. Wilkister Musau
3. Lynda Isaaka
4. Conrad Wanyama
5. Vaishnavi Menon
6. Alastair Denniston
7. Xiaoxuan Liu
8. Mira Emmanuel-Fabula
9. Gwydion Williams
10. Bilal A. Mateen
11. Ambrose Agweyu
This article has no evaluationsLatest version Oct 27, 2025
Probing the Surgical Competence of LLMs: A global health study leveraging AfriMedQA benchmarks

This article has 21 authors:
1. Tobi Olatunji
2. Folafunmi Omofoye
3. Ezinwanne C. Aka
4. Gina Itzikowitz
5. Daniel Macaulay
6. OyinOluwa G. Adaramola
7. Boluwatife A. Adewale
8. Chidi Asuzu
9. Simisola Popoola
10. Wendy Kinara
11. Emmanuel Ayodele
12. Ifeoluwa Yinusa
13. Oluwatoni Adekunle
14. Mardhiyah Sanni
15. Chibuzor Okocha
16. Tassallah Abdullahi
17. Abraham Owoduni
18. Charles Nimo
19. Mercy N. Asiedu
20. Bilal Mateen
21. Rebecca Weintraub
This article has no evaluationsLatest version Oct 7, 2025
Patient friendly summaries of oncology consultations generated by large language models - A pilot study of patient and provider satisfaction

This article has 5 authors:
1. Sonali Harchandani
2. Ryann Quinn
3. Kriti MIttal
4. Alex Martin
5. Ming-Jin Wang Ryan Holstead
This article has no evaluationsLatest version Oct 15, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Benchmarking Large Language Models and Clinicians Using Locally Generated Primary Healthcare Vignettes in Kenya

Probing the Surgical Competence of LLMs: A global health study leveraging AfriMedQA benchmarks

Patient friendly summaries of oncology consultations generated by large language models - A pilot study of patient and provider satisfaction