PRISM-G: an interpretable privacy scoring method for assessing risk in synthetic human genome data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The growing use of synthetic genomic data promises broader data access but raises unresolved concerns about privacy risk. We introduce PRISM-G, a model-agnostic framework that summarizes privacy exposure of synthetic genomes across three complementary components: (i) a proximity view that asks whether synthetic individuals lie unusually close to real genomes in genetic-coordinate space; (ii) a kinship view that detects replay of familial or population-structure patterns beyond what is expected by chance; and (iii) a trait-linked view that captures exposure through rare variants and simple membership-inference signals. Each component yields a normalized risk score and a risk-averse aggregation maps these to a 0–100 PRISM-G score. We evaluated PRISM-G on synthetic cohorts generated by a generative adversarial network (GAN), a restricted Boltzmann machine (RBM), and a logic-based SAT-solver (Genomator). Our results show that privacy vulnerabilities concentrate along different axes across models and marker densities, underscoring that a single privacy-based similarity metric is insufficient.