Transformers and large language models are efficient feature extractors for electronic health record studies

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

While unstructured free-text data is abundant in electronic health records, challenges in accurate and scalable information extraction often result in a preference for incorporating information from less specific clinical codes instead. We investigate the efficacy of modern natural language processing methods (NLP) and large language models (LLMs) in supervised and few-shot scenarios to extract features from 938,150 hospital antibiotic prescriptions from Oxfordshire, UK. A subset of 4000 most frequent indications for antibiotic use was labelled by clinical researchers into 11 categories, describing the infection source/clinical syndrome, for model training. On separate internal (n = 2000) and external test datasets (n = 2000), the fine-tuned domain-specific Bio + Clinical BERT model averaged an F1 score of 0.97 and 0.98 respectively across the classes and outperformed traditional regex (F1 = 0.71 and 0.74) and n-grams/XGBoost (F1 = 0.86 and 0.84). A few-shot OpenAI GPT4 model achieved F1 scores of 0.71 and 0.86 without using labelled training data and a fine-tuned GPT3.5 model F1 scores of 0.95 and 0.97. Finetuned BERT-based transformer models currently outperform LLMs for structured tasks, while few shot LLMs match the performance of traditional NLP without the need for labelling. Comparing infection sources extracted from ICD10 codes to those parsed from free-text indications, free-text indications revealed 31% more specific infection sources. With their high accuracy, modern transformer-based models have the potential to be used widely throughout medicine to structure free-text records, providing more granular information than clinical codes, facilitating better research and patient care.

Article activity feed