Membership Disclosure Evaluation for Synthetic Clinical Text Generated by LLM via Dynamic Few-Shot In-Context Learning

Sen Li
Fida K. Dankar
Khaled El Emam

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose : LLMs are increasingly used to generate synthetic clinical notes for data sharing, but membership inference attacks (MIAs) can reveal whether specific real notes appeared in prompts or training data. Prior studies often examine static few-shot prompts, i.e., fixed exemplar notes that remain the same across queries. We investigate MIAs on synthetic clinical text datasets as an evaluation metric for membership disclosure risk from the perspective of data custodians, focusing on synthetic outputs generated by dynamic few-shot ICL with LLMs. Our goal is to examine whether it is possible to estimate the membership disclosure risk of the generative pipeline using only the real dataset, by simulating MIAs on synthetic outputs and a limited adversarial dataset. Methods : We develop a custodial auditing framework based on sentence-embedding similarity between real notes and the synthetic corpus, using three features: Max (nearest-neighbor similarity), Top-3 (mean of three strongest similarities), and Mean (centroid alignment). We formalize membership under both static and dynamic ICL and introduce a population-aware partitioning protocol that preserves realistic priors for evaluation. We report F1 and an adjusted metric M=(F1-Fmax)/(1-Fmax), where Fmax denotes the F1 score achieved by the naive adversarial strategy of classifying all records as members. Experiments span static and dynamic ICL with $k \in \{1,2,3\}$ and include extensive resampling to assess stability. The entire pipeline is text-only and requires no access to prompts, gradients, or model weights. Results : Static 1-shot ICL exhibits high membership disclosure risk, with Max and Top-3 achieving strong attack performance (high F1 and large positive $M$), indicating substantial leakage. Increasing the number of shots to 2 or 3 in static ICL markedly reduces risk, yielding $M$ below conservative thresholds and bringing performance close to membership-preserving regimes. Dynamic ICL consistently suppresses disclosure across all $k$, with F1 near chance and $M \approx 0$ or slightly negative, signifying negligible exploitable signal. The largest contrast between regimes occurs at 1-shot, underscoring prompt reuse as the primary driver of risk. Conclusion : Dynamic few-shot ICL offers a practical mitigation for membership disclosure in synthetic clinical text. Static 1-shot prompting should be avoided for public releases; when static prompting is unavoidable, using 2–3 examples substantially mitigates leakage. Our text-only, model-agnostic auditing provides custodians with deployable tools and an interpretable metric to decide release readiness. Ongoing experiments will expand datasets, LLMs, and auxiliary data sizes, and assess robustness to domain drift and prompt formatting changes.

Version published to 10.21203/rs.3.rs-8997592/v1 on Research Square
Mar 30, 2026

Uncertainty Aware Llm Deidentification and Anonymization of Clinical Notes

This article has 5 authors:
1. Adrienne Kline
2. Nafiseh Ghaffar Nia
3. Tyler J. Smith
4. David Leibowitz
5. Douglas Johnston
This article has no evaluationsLatest version Mar 19, 2026
AI-Generated Prior Authorization Letters: Strong Clinical Content, Weak Administrative Scaffolding

This article has 2 authors:
1. Moiz Sadiq Awan
2. Maryam Raza
This article has no evaluationsLatest version Apr 14, 2026
Comparative Evaluation of Synthetic and Real-World Data in Predicting Oral Premalignant Lesions: A Machine Learning Approach from Rural India

This article has 2 authors:
1. P M.Sc. Sundar M MD
2. Siva M
This article has no evaluationsLatest version Apr 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Uncertainty Aware Llm Deidentification and Anonymization of Clinical Notes

AI-Generated Prior Authorization Letters: Strong Clinical Content, Weak Administrative Scaffolding

Comparative Evaluation of Synthetic and Real-World Data in Predicting Oral Premalignant Lesions: A Machine Learning Approach from Rural India