Evaluating Large Language Models for Automatic Detection of In-Hospital Cardiac Arrest: Multi-Site Analysis of Clinical Notes

Uğurcan Vurgun
Aarthi Kaviyarasu
Sy Hwang
Ashley Batugo
Sunil Thomas
Brandon Tang
Ana Acevedo
Oscar Mitchell
Danielle L. Mowery

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In-hospital cardiac arrest (IHCA) affects over 200,000 patients annually in the United States, yet its detection through manual chart review remains resource-intensive and often delayed. We evaluated the performance of four open-source large language models (LLMs) and GPT-4o in identifying IHCA cases from 2,674 clinical notes across five hospitals. While GPT-4o achieved the highest performance (F1-score: 0.90, recall: 0.97), several open-source models demonstrated comparable capabilities, suggesting their viability for clinical applications. Our systematic analysis of model outputs revealed that performance was strongly influenced by site-specific documentation practices, with inter-site agreement rates varying by over 20%. Through detailed error analysis, we identified key challenges including medical terminology hallucinations and structural inconsistencies in model reasoning. These findings establish a framework for implementing LLM-based IHCA detection systems while highlighting critical considerations for their clinical deployment.

Version published to 10.1101/2025.08.04.25331524 on medRxiv
Aug 6, 2025

From Clinical Judgment to Large Language Models: Benchmarking Predictive Approaches for Unplanned Hospital Admissions

This article has 2 authors:
1. Bernardo Neves
2. Mário J. Silva
This article has no evaluationsLatest version Sep 12, 2025
Computer Assisted Verbal Autopsy: Comparing Large Language Models to Physicians for Assigning Causes to 6939 Deaths in Sierra Leone from 2019-2022

This article has 11 authors:
1. Richard Wen
2. Anteneh Tesfaye Assalif
3. Andy Sze-Heng Lee
4. Rajeev Kamadod
5. Asha Behdinan
6. Ronald Carshon-Marsh
7. Catherine Meh
8. Thomas Kai Sze Ng
9. Patrick Brown
10. Prabhat Jha
11. Rashid Ansumana
This article has no evaluationsLatest version Sep 24, 2025
Retrospective Evaluation of a Generative AI-Enabled Electronic Medical Record System in Primary Health Care Facilities in Kenya

This article has 10 authors:
1. Ambrose Agweyu
2. Paul Mwaniki
3. Wilkister Musau
4. Robert Korom
5. Lynda Isaaka
6. Conrad Wanyama
7. Sarah Kiptinness
8. Najib Adan
9. Mira Emmanuel-Fabula
10. Bilal A. Mateen
This article has no evaluationsLatest version Sep 7, 2025

Listed in

Abstract

Article activity feed

Related articles

From Clinical Judgment to Large Language Models: Benchmarking Predictive Approaches for Unplanned Hospital Admissions

Computer Assisted Verbal Autopsy: Comparing Large Language Models to Physicians for Assigning Causes to 6939 Deaths in Sierra Leone from 2019-2022

Retrospective Evaluation of a Generative AI-Enabled Electronic Medical Record System in Primary Health Care Facilities in Kenya