Can LLMs Improve Healthcare Delivery? Evidence from Physician Review and Objective Testing *

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We deployed a large language model (LLM) decision support using GPT4 for health workers at two outpatient clinics in Nigeria. For each patient, health workers drafted care plans that were optionally revised after LLM feedback. We compared unassisted and assisted plans using (i) blinded randomized assessments by on-site physicians who assessed and treated the same patients and (ii) results from laboratory tests for common conditions. Academic physicians performed blinded retrospective reviews of a subset of notes. Providers reported high satisfaction with LLM feedback, and retrospective academic reviewers rated LLM-assisted plans more favorably. However, on-site physicians observed little to no improvement in diagnostic alignment or treatment decisions. Objective testing showed mixed effects of LLM-assistance, with reduced overtesting for malaria but increased overtesting for urinary tract infection and anemia. This highlights a gap between chart-based reviews and real-world clinical relevance that may be especially important in evaluating the effectiveness of LLM-based interventions.

Article activity feed