A Scalable Method for Validated Data Extraction from Electronic Health Records with Large Language Models

Timothy J. Stuhlmiller
AJ Rabe
Jeff Rapp
Penelope Manasco
Alaa Awawda
Hiba Kouser
Hugh Salamon
Donald Chuyka
William Mahoney
Kenny K. Wong
Glenn A. Kramer
Mark A. Shapiro

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose

Extracting and structuring relevant clinical information from electronic health records (EHRs) remains a challenge due to the heterogeneity of systems, documents, and documentation practices. Large Language Models (LLMs) provide an approach to processing semi-structured and unstructured EHR data, enabling classification, extraction, and standardization.

Methods

Medical documents are processed through a structured data pipeline to generate normalized FHIR data. Unstructured data undergoes preprocessing, including optical character recognition, document parsing, text chunking and embedding. Embedding enables search and classification which facilitate document retrieval for extraction. LLMs perform named entity recognition and relation extraction, with outputs mapped to FHIR R4 and OMOP and harmonized with pre-structured data for interoperability. Model performance is evaluated through human validation and automated consistency checks. Iterative refinement, error analysis, and standardized schema selection optimize use for analytics and downstream workflows.

Results

An LLM-based schema for medication extraction, validated on 34 patients, achieved ∼95% accuracy and F1-score across 7 data fields for 5,789 extracted medications. Deployment across 11,115 patients extracted 2.6 million medication records, increasing total medications by 27% and distinct drug ingredients by 31% over structured data in the EHR. LLM extraction increased oncology medications by 60%, distinct oncology therapies by 64%, and the number of patients with structured oncology medication data by 33%. The LLM enhanced data completeness, improving availability of indication for prescription (61% vs. 31%) and discontinuation reason (17% vs. 0%), outperforming pre-structured data in key clinical variables.

Conclusion

An LLM-powered extraction process that employs embeddings, machine learning classification, schema-based extraction, and mapping of extracted information to healthcare data standards, achieves a significant gain in clinically relevant information over pre-structured data available in the EHR.

Version published to 10.1101/2025.02.25.25322898v1 on medRxiv
Feb 26, 2025

Medication information extraction using local large language models

This article has 7 authors:
1. Phillip Richter-Pechanski
2. Marvin Seiferling
3. Christina Kiriakou
4. Dominic M. Schwab
5. Nicolas A. Geis
6. Christoph Dieterich
7. Anette Frank
This article has no evaluationsLatest version Mar 31, 2025
Interoperable web platform based on large language models for medicals data analysis

This article has 7 authors:
1. Marcello Carvalho dos Reis
2. Rafaelly Rios dos Santos
3. Md Rafiul Hassan
4. Mohammad Mehedi Hassan
5. Daniel Santos da Silva
6. Pedro Lino Azevedo Landim
7. Victor Hugo Costa de Albuquerque
This article has no evaluationsLatest version Apr 3, 2025
AI-Assisted Data Extraction with a Large Language Model: A Study Within Reviews

This article has 18 authors:
1. Gerald Gartlehner
2. Shannon Kugley
3. Karen Crotty
4. Meera Viswanathan
5. Andreea Dobrescu
6. Barbara Nussbaumer-Streit
7. Graham Booth
8. Jonathan R. Treadwell
9. Jung Min Han
10. Jesse Wagner
11. Eric A. Apaydin
12. Erin L. Coppola
13. Margaret Maglione
14. Rainer Hilscher
15. Robert Chew
16. Meagan Pilar
17. Bryan Swanton
18. Leila C. Kahwati
This article has no evaluationsLatest version Mar 21, 2025

Listed in

Abstract

Purpose

Methods

Results

Conclusion

Article activity feed

Related articles

Medication information extraction using local large language models

Interoperable web platform based on large language models for medicals data analysis

AI-Assisted Data Extraction with a Large Language Model: A Study Within Reviews