Automating the quality monitoring of a hospital discharge summary improvement project utilising large language models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Quality improvement activities in healthcare are limited by the substantial time burden associated with manual clinical text review. To address this limitation within an established hospital discharge summary improvement project, we aimed to automate quality monitoring using large language models. Models were trained to identify ‘perfect’ content using clinician-graded data from 1,876 discharge summaries. Performance was evaluated on a held out validation subset and then applied to 107,000 summaries covering the full project period. The models showed strong agreement with clinician-graded data, achieving F1 scores of 87 to 95 percent across targeted text fields. Automated processing enabled near real time evaluation of the entire dataset and revealed trends that were not detectable through traditional sampling methods. These findings demonstrate the feasibility of using large language models to increase the efficiency, coverage, and analytical depth of quality improvement and audit activities that rely on free-text review.