Image-Conditioned Diffusion for Privacy-Preserving Synthetic Medical Images
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Medical imaging models depend on large, shareable datasets, yet privacy constraints limit data dissemination. Current text-conditioned diffusion models fail to preserve subtle, distributed clinical signals, such as continuous physiological biomarkers, rendering synthetic data insufficient for robust downstream physiological modeling. Here, we evaluate image-to-image (I2I) diffusion as a tunable, privacy-preserving transformation that produces a synthetic counterpart of real images while preserving downstream-relevant information. We fine-tune Stable Diffusion with low-rank adapters on retinal fundus photographs and chest radiographs, assessing fidelity, clinical signal preservation, cross-site transfer, and empirical re-identification risk. I2I consistently outperforms text-to-image generation in image fidelity and in preserving biomarker information. In cross-cohort transfer to an external retinal dataset from the UK Biobank, pretraining on I2I synthetic data performs comparably to real-image pretraining and surpasses it in the smallest fine-tuning sets. Varying I2I strength reveals that the privacy–utility tradeoff is highly modality-dependent: while retinal images achieve practical de-identification, chest X-rays exhibit structural combinatorics that leave them substantially re-identifiable even at high noise strengths, exposing critical boundaries for diffusion-based anonymization. These results position image-conditioned diffusion as a practical approach for generating shareable medical images with tunable de-identification.