Verifiable Summarization of Electronic Health Records Using Large Language Models to Support Chart Review

Ritchie Verma
Emily Alsentzer
Zachary Strasser
Leslie Chang
Kirollos Roman
Esteban Gershanik
Camellia Hernandez
Miguel Linares
Jorge Rodriguez
Durga Thakral
Ozan Unlu
Jacqueline You
Li Zhou
David Bates

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Information overload in electronic health records (EHRs) hampers clinicians’ ability to efficiently extract and synthesize critical information from a patient’s longitudinal health record, leading to increased cognitive burden and delays in care. This study explores the potential of large language models (LLMs) to address this challenge by generating problem-based admission summaries for patients admitted with heart failure, a leading cause of hospitalization worldwide. We developed an extract-then-abstract approach guided by disease-specific “summary bundles” to generate summaries of longitudinal clinical notes that prioritize clinically relevant information. Through a mixed-methods evaluation using real-world clinical notes, we compared physicians’ ability to answer patient-specific clinical questions with the LLM-generated summaries versus standard chart review. While summary access did not significantly reduce overall questionnaire completion time, frequent summary use significantly contributed to faster questionnaire completion (p = 0.002). Individual physicians varied in how effectively they leveraged the summaries. Importantly, summary use maintained accuracy in answering clinical questions (88.0% with summaries vs. 86.4% without). All physicians indicated they were “likely” or “very likely” to use the summaries in clinical practice, and 87.5% reported that the summaries would save them time. Preferences for summary format varied, highlighting the need for customizable summaries aligned with individual clinician workflows. This study provides one of the first extrinsic evaluations of LLMs for longitudinal summarization, demonstrating their potential to enhance clinician efficiency, alleviate workload, and support informed decision-making in time-sensitive care environments.

Version published to 10.1101/2025.06.02.25328807 on medRxiv
Jun 3, 2025

Implementation of Large Language Models in Electronic Health Records

This article has 3 authors:
1. Maxime Griot
2. Jean Vanderdonckt
3. Demet Yuksel
This article has no evaluationsLatest version Jul 4, 2025
Leveraging Large Language Models on Automating Outpatients’ Message Classifications of Electronic Medical Records

This article has 3 authors:
1. Amima Shifa
2. G G Md Nawaz Ali
3. Roopa Foulger
This article has no evaluationsLatest version Jul 3, 2025
CLEVER: Clinical Large Language Model Evaluationby Expert Review

This article has 4 authors:
1. Veysel Kocaman
2. Mustafa Kaya
3. Andrei Ferrer
4. David Talby
This article has no evaluationsLatest version Jul 23, 2025

Listed in

Abstract

Article activity feed

Related articles

Implementation of Large Language Models in Electronic Health Records

Leveraging Large Language Models on Automating Outpatients’ Message Classifications of Electronic Medical Records

CLEVER: Clinical Large Language Model Evaluationby Expert Review