Image-Conditioned Diffusion for Privacy-Preserving Synthetic Medical Images

Doron Yaya-Stupp
Guy Lutsker
Or Spiegel-Yerushalmi
Eran Segal

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Medical imaging models depend on large, shareable datasets, yet privacy constraints limit data dissemination. Current text-conditioned diffusion models fail to preserve subtle, distributed clinical signals, such as continuous physiological biomarkers, rendering synthetic data insufficient for robust downstream physiological modeling. Here, we evaluate image-to-image (I2I) diffusion as a tunable, privacy-preserving transformation that produces a synthetic counterpart of real images while preserving downstream-relevant information. We fine-tune Stable Diffusion with low-rank adapters on retinal fundus photographs and chest radiographs, assessing fidelity, clinical signal preservation, cross-site transfer, and empirical re-identification risk. I2I consistently outperforms text-to-image generation in image fidelity and in preserving biomarker information. In cross-cohort transfer to an external retinal dataset from the UK Biobank, pretraining on I2I synthetic data performs comparably to real-image pretraining and surpasses it in the smallest fine-tuning sets. Varying I2I strength reveals that the privacy–utility tradeoff is highly modality-dependent: while retinal images achieve practical de-identification, chest X-rays exhibit structural combinatorics that leave them substantially re-identifiable even at high noise strengths, exposing critical boundaries for diffusion-based anonymization. These results position image-conditioned diffusion as a practical approach for generating shareable medical images with tunable de-identification.

Version published to 10.64898/2026.05.04.722524 on bioRxiv
May 7, 2026

Cross-Device Adaptation of Mirai for Mammography-Based Breast Cancer Risk Prediction

This article has 16 authors:
1. Adriana Sistig
2. Joseph H. Rothstein
3. Tejomay Gadgil
4. Ninah Achacoso
5. Stacey E. Alexeeff
6. Lawrence D. Gerstley
7. Robert J. Klein
8. Laurie R. Margolies
9. Albert Pu
10. Cara L. Smith Gueye
11. Marvella Villaseñor
12. Mark Westley
13. Laurel A. Habel
14. Vignesh A. Arasu
15. Weiva Sieh
16. Li Shen
This article has no evaluationsLatest version Jun 17, 2026
Multi-Task TabGraphSyn: Graph-Based Synthetic EHR Generation with Improved Quality-Privacy Trade-offs for Opioid Use Disorder Prediction

This article has 2 authors:
1. Mohammad Arif Ul Alam
2. Sophia Shalhout
This article has no evaluationsLatest version Apr 27, 2026
Can synthetic data overcome the privacy and fidelity bottleneck in Pharmacometrics? A comparative benchmark using a daptomycin population pharmacokinetic model

This article has 9 authors:
1. Alexandre Destere
2. Romain Lombardi
3. Marc Labriffe
4. Clément Benoist
5. Pierre Marquet
6. Thibaud Lavrut
7. Alexandre Gérard
8. Charles Bouveyron
9. Jean-Baptiste Woillard
This article has no evaluationsLatest version Jun 2, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Cross-Device Adaptation of Mirai for Mammography-Based Breast Cancer Risk Prediction

Multi-Task TabGraphSyn: Graph-Based Synthetic EHR Generation with Improved Quality-Privacy Trade-offs for Opioid Use Disorder Prediction

Can synthetic data overcome the privacy and fidelity bottleneck in Pharmacometrics? A comparative benchmark using a daptomycin population pharmacokinetic model