Prompts to Table: Specification and Iterative Refinement for Clinical Information Extraction with Large Language Models

David Hein
Alana Christie
Michael Holcomb
Bingqing Xie
AJ Jain
Joseph Vento
Neil Rakheja
Ameer Hamza Shakur
Scott Christley
Lindsey G. Cowell
James Brugarolas
Andrew R Jamieson
Payal Kapur

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Extracting structured data from free-text medical records at scale is laborious, and traditional approaches struggle in complex clinical domains. We present a novel, end-to-end pipeline leveraging large language models (LLMs) for highly accurate information extraction and normalization from unstructured pathology reports, focusing initially on kidney tumors. Our innovation combines flexible prompt templates, the direct production of analysis-ready tabular data, and a rigorous, human-in-the-loop iterative refinement process guided by a comprehensive error ontology. Applying the finalized pipeline to 2,297 kidney tumor reports with pre-existing templated data available for validation yielded a macro-averaged F1 of 0.99 for six kidney tumor subtypes and 0.97 for detecting kidney metastasis. We further demonstrate flexibility with multiple LLM backbones and adaptability to new domains utilizing publicly available breast and prostate cancer reports. Beyond performance metrics or pipeline specifics, we emphasize the critical importance of task definition, interdisciplinary collaboration, and complexity management in LLM-based clinical workflows.

Version published to 10.1101/2025.02.11.25322107v2 on medRxiv
Apr 1, 2025
Version published to 10.1101/2025.02.11.25322107v1 on medRxiv
Feb 13, 2025

Medication information extraction using local large language models

This article has 7 authors:
1. Phillip Richter-Pechanski
2. Marvin Seiferling
3. Christina KIRIAKOU
4. Dominic M. Schwab
5. Nicolas A. Geis
6. Christoph Dieterich
7. Anette Frank
This article has no evaluationsLatest version Mar 31, 2025
A Scalable Method for Validated Data Extraction from Electronic Health Records with Large Language Models

This article has 12 authors:
1. Timothy J. Stuhlmiller
2. AJ Rabe
3. Jeff Rapp
4. Penelope Manasco
5. Alaa Awawda
6. Hiba Kouser
7. Hugh Salamon
8. Donald Chuyka
9. William Mahoney
10. Kenny K. Wong
11. Glenn A. Kramer
12. Mark A. Shapiro
This article has no evaluationsLatest version Feb 26, 2025
How to Write Effective Prompts for Screening Biomedical Literature Using Large Language Models

This article has 5 authors:
1. Maria Teresa Colangelo
2. Stefano Guizzardi
3. Marco Meleti
4. Elena Calciolari
5. Carlo Galli
This article has no evaluationsLatest version Mar 11, 2025

Listed in

Abstract

Article activity feed

Related articles

Medication information extraction using local large language models

A Scalable Method for Validated Data Extraction from Electronic Health Records with Large Language Models

How to Write Effective Prompts for Screening Biomedical Literature Using Large Language Models