Developing a scalable pipeline for data extraction from clinical letters through resource-efficient prompt engineering

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Free-text clinical records represent an untapped wealth of data for secondary use. Their potential is limited by resource demands necessary for accurate information extraction at scale. We introduce a scalable, resource-efficient, and high-performance pipeline which leverages large language models (LLMs) to address these challenges. This was developed and tested using real-world dual specialist-annotated ophthalmic clinical letters. Our pipeline achieved strong performance with a proprietary model in the development phase, yielding a maximum micro-averaged F1 score of 0.954 (95% CI 0.941–0.967) for diagnosis across nine conditions through iterative prompt refinement alone, also demonstrating strong generalisability (micro-F1 ranging from 0.945–0.980) in temporal validation. This approach extended to two other proprietary models in the same family and was tested in 17 local models from seven open-weight LLM families, demonstrating robustness against model choice and deployment constraints (for models > 10B parameters). Beyond performance, we develop a multi-dimensional assessment to evaluate LLMs for deployment in data extraction tasks, including introducing an error taxonomy to classify failure modes and implementing Pareto frontier analyses to systematically map the operational trade-offs (costs, time) across various LLM configurations. A robust approach to operationalising LLMs in real-world workflows at scale may help lay the foundation for next-generation data pipelines that can accelerate scientific discovery and power continuous learning health systems.

Article activity feed