Privacy-Preserving Retrieval for Auditable Clinical Language Modeling on Real-World Radiology Data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Clinical large language models adapted for real-world use are commonly fine-tuned on patient data, embedding confidential information within model parameters and limiting auditability and privacy. We evaluate a retrieval-based framework separating clinical data from the language model by storing patient records in externally governed memory. On a radiology report summarisation task, retrieval recovers 32–67% of fine-tuning gains (perplexity, ROUGE-L; p < 0.0001), while hallucination reduction remains a future focus.