SPELL-LLMs: A Scalable and Privacy-Compliant NLP Pipeline Using Locally Hosted Large Language Models for Clinical Information Extraction

Ricardo Kleinlein
Kathryn J. Gray
David Bates
Vesela P. Kovacheva

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective: Electronic health records (EHRs) contain valuable information for clinical research and decision making. However, leveraging these data remains challenging due to data heterogeneity, inconsistent documentation, missing information, and evolving terminology, especially within unstructured clinical notes. We developed a scalable, privacy preserving natural language processing (NLP) workflow to systematically extract structured clinical insights from large volumes of clinical narratives. Materials and Methods: Our platform employs a hybrid approach combining regular expressions (regex) to rapidly identify relevant textual snippets with locally hosted large language models (LLMs) for accurate clinical interpretation. All data processing occurs securely within institutional environments, adhering strictly to data privacy regulations. The modular Python based workflow facilitates adaptation across institutions and is optimized for computational efficiency, supporting high-throughput processing even in resource-limited settings. Results: The pipeline efficiently processed millions of clinical reports (1976 2024) from multiple hospitals. By analyzing targeted snippets rather than entire documents, our approach reduced processing time by 80% compared to traditional full document LLM inference, and by 97% compared to manual physician annotation. Accuracy was rigorously validated using three obstetric tasks: extraction of numerical values (blood loss volumes), dates (estimated delivery dates), and diagnoses (hemolysis, elevated liver enzymes, and low platelets [HELLP] syndrome), achieving 95% agreement with expert annotations. Generalizability was further confirmed by accurately identifying ventricular tachycardia diagnoses in the publicly available MT Samples dataset. Discussion and Conclusions: Our hybrid NLP framework significantly enhances the usability of unstructured EHR data for clinical research, decision support, and large-scale retrospective analyses.

Version published to 10.1101/2025.07.25.25332130 on medRxiv
Jul 25, 2025

Privacy Protection for Chinese Electronic Medical Records Using Large Language Models: Effectiveness Evaluation and Application of LLM Models in Medical Data Tasks

This article has 11 authors:
1. Gong Mengchun
2. Ouyang Zihao
3. Ma Dandan
4. Cai Endi
5. Liu Chao
6. Shi Wenzhao
7. Zhang Bohan
8. Ma Lian
9. Wei Yuna
10. Jiang Huizhen
11. Zhou Xiang
This article has no evaluationsLatest version Jul 28, 2025
Automated De-Identification, Consistent Obfuscation, and Regulatory Grade Validation of 2 Billion Patient Notes

This article has 9 authors:
1. Veysel Kocaman
2. Lindsay Mico
3. Mustafa Aytug Kaya
4. Nadaa Taiyab
5. David Talby
6. Tae Surh
7. Yuqing Guo
8. Vivek Tomer
9. Robert Kramer
This article has no evaluationsLatest version Sep 5, 2025
Query Augmented Generation (QAG) from the Genomic Data Commons for Accurate Variant Statistics

This article has 7 authors:
1. Aarti Venkat
2. William P. Wysocki
3. Michael Lukowski
4. Steven Song
5. Anirudh Subramanyam
6. Zhenyu Zhang
7. Robert L. Grossman
This article has no evaluationsLatest version Sep 7, 2025

Listed in

Abstract

Article activity feed

Related articles

Privacy Protection for Chinese Electronic Medical Records Using Large Language Models: Effectiveness Evaluation and Application of LLM Models in Medical Data Tasks

Automated De-Identification, Consistent Obfuscation, and Regulatory Grade Validation of 2 Billion Patient Notes

Query Augmented Generation (QAG) from the Genomic Data Commons for Accurate Variant Statistics