Developing a scalable pipeline for data extraction from clinical letters through resource-efficient prompt engineering

Ariel Yuhan Ong
Quang Nguyen
Ishani Barai
Justin Engelmann
Fares Antaki
Mertcan Sevgi
David A Merle
Lie Ju
Eliot Dow
Yukun Zhou
Gregory Maniatopoulos
Yemisi Takwoingi
Alastair K Denniston
Pearse A Keane

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Free-text clinical records represent an untapped wealth of data for secondary use. Their potential is limited by resource demands necessary for accurate information extraction at scale. We introduce a scalable, resource-efficient, and high-performance pipeline which leverages large language models (LLMs) to address these challenges. This was developed and tested using real-world dual specialist-annotated ophthalmic clinical letters. Our pipeline achieved strong performance with a proprietary model in the development phase, yielding a maximum micro-averaged F1 score of 0.954 (95% CI 0.941–0.967) for diagnosis across nine conditions through iterative prompt refinement alone, also demonstrating strong generalisability (micro-F1 ranging from 0.945–0.980) in temporal validation. This approach extended to two other proprietary models in the same family and was tested in 17 local models from seven open-weight LLM families, demonstrating robustness against model choice and deployment constraints (for models > 10B parameters). Beyond performance, we develop a multi-dimensional assessment to evaluate LLMs for deployment in data extraction tasks, including introducing an error taxonomy to classify failure modes and implementing Pareto frontier analyses to systematically map the operational trade-offs (costs, time) across various LLM configurations. A robust approach to operationalising LLMs in real-world workflows at scale may help lay the foundation for next-generation data pipelines that can accelerate scientific discovery and power continuous learning health systems.

Version published to 10.21203/rs.3.rs-8921439/v1 on Research Square
Mar 10, 2026

A Comparative Investigation of Zero-shot Prompting and Fine-tuning for Clinical Note Summarization

This article has 5 authors:
1. Abir Naskar
2. Jane S Hocking
3. Patty Chondros
4. Douglas Boyle
5. Mike Conway
This article has no evaluationsLatest version Mar 13, 2026
AI-Generated Prior Authorization Letters: Strong Clinical Content, Weak Administrative Scaffolding

This article has 2 authors:
1. Moiz Sadiq Awan
2. Maryam Raza
This article has no evaluationsLatest version Apr 14, 2026
Bayesian Optimization of ASCII Structural Anchors for Improving Large Language Model Performance in Biomedical Knowledge Mining

This article has 4 authors:
1. Dong Xu
2. Muhammad Azam
3. Shuai Zeng
4. Hasanain Aldihis
This article has no evaluationsLatest version Apr 14, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Comparative Investigation of Zero-shot Prompting and Fine-tuning for Clinical Note Summarization

AI-Generated Prior Authorization Letters: Strong Clinical Content, Weak Administrative Scaffolding

Bayesian Optimization of ASCII Structural Anchors for Improving Large Language Model Performance in Biomedical Knowledge Mining