Decoding Clinician Authorial Style: A Style-Informed Pipeline for Clinical Document Summary Generation with Large Language Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large language models (LLMs) can automate clinical document summary generation. However, even clinically accurate outputs often fail to reflect individual clinicians’ writing styles, leading to substantial post-editing. We examine this stylistic gap using a multi-author corpus of de-identified clinical summaries. We propose a style-informed generation framework that extracts clinician-specific stylistic features through LLM feedback and applies a Train→Generate paradigm to produce personalized clinical summaries. Conventional metrics (ROUGE, BERTScore, cosine similarity) largely failed to distinguish intra-author from inter-author writing patterns, while Jaro-Winkler and BLEU demonstrated limited sensitivity. Targeted LLM-guided feature extraction—emphasizing rhythm, narration, and sentence or list structure—improved authorship classification up to 73% of accuracy. In blinded clinician A/B testing, GPT-4-generated drafts were preferred less often than original notes, whereas the Gemini 2.5 Pro pipeline produced drafts preferred at rates comparable to, and in some cases exceeding, clinician-authored summaries. While inherent hallucination risks were noted, they were mitigated via high-fidelity prompt engineering and explicit adherence to source-only data constraints. These results suggest that style-informed generation can reduce the style gap and produce clinically acceptable clinical summaries that better align with the clinician’s voice.