Language Models for standardising clinical notes and information extraction in addiction psychiatry – an empirical study

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Introduction: Electronic Health Records (EHR) contain both structured and unstructured data, with unstructured clinical notes (CNs) widely used in addiction psychiatry. CNs have numerous errors and require proofreading before downstream applications. This study evaluates NLP methods and adapts a Language Model (LM) for proofreading CNs and extracting substance-related information.Methods: We analysed CNs from a five-year addiction medicine EHR dataset (2018-2023), selecting 6,500 notes. The proofreading task involved correcting spelling, expanding abbreviations, while information extraction (IE) identified substance use presence and quantified the time since last use. Annotations by a team of doctors and nurses provided the gold standard. Against this, we compared performance of existing solutions including LMs and adapted an LM for these tasks. The final model is also compared against state-of-the-art commercial model (Gpt-4-o) and an acceptability-distinguishability experiment is done with masked raters.Results: Proofreading improved readability and decreased out-of-vocabulary words. LM-based solutions outperformed simpler approaches. The final model performed better than Gpt-4-o on both tasks. Human evaluators found model-corrected CNs indistinguishable from human-proofread versions, correctly guessing the identity in only 27.9% instances and preferring model outputs in 55.7%. On the IE task, while the overall performance is satisfactory (Mean F1 0.99), it is poor on rarer substance classes like hallucinogens.Discussion and Conclusions: Fine-tuned LMs effectively standardized CNs and extracted structured information from addiction psychiatry records. It is possible to adapt open-source LMs for bespoke tasks in addiction psychiatry with limited computational resources. This standardisation can enable large-scale dataset collations for deep learning based predictive modelling.

Article activity feed