SAFE-LD: A novel method for the estimation of linkage disequilibrium from summary statistics

Giulia Elizabeth de Sanctis
Sodbo Sharapov
Davide Bolognini
Francesca Ieva
Nicole Soranzo
Emanuele Di Angelantonio
Claudia Giambartolomei
Nicola Pirastu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Genome-wide association studies (GWAS) have greatly advanced our understanding of the genetic architecture of complex traits. Downstream analyses of GWAS summary statistics require accurate in-sample LD, the variant correlations in the same individuals used for the GWAS, as even small discrepancies can propagate into substantial error. In practice, privacy and consent restrictions prevent sharing of individual-level genotypes, forcing researchers either to rely on external reference panels, which reduce accuracy and power, or to store and distribute massive precomputed LD matrices that are inflexible and difficult to analyze. Here we introduce SAFE-LD ( Shrinkage and Anonymisation Framework for LD Estimation ), a novel method that produces pseudo-genotypes designed to reproduce the exact in-sample LD of a cohort, while discarding all individual-level genetic content. SAFELD surrogates can be stored in VCF/PGEN formats and used seamlessly with standard pipelines, providing LD estimates indistinguishable from the originals but free from privacy concerns. Using extensive simulations on UK Biobank data, we show that SAFE-LD is robust across genomic regions and population sizes. Notably, SAFE-LD achieves finemapping accuracy on par with internal LD, and significantly outperforms external LD even under best-case conditions with cohort-matched reference panels. We further extend this framework to existing GWAS summary statistics through SAFE-LDss , which exploits existing published summary statistics where numerous traits have been analyzed on the same samples. SAFE-LD offers a scalable, privacy-preserving, and highly accurate alternative to traditional LD estimation, enabling easy sharing and seamless utilization with standard tools. By storing compact pseudo-genotypes instead of massive precomputed LD matrices, it also provides a highly efficient solution in terms of disk space and data management, while safeguarding participant privacy and supporting precise fine-mapping.

Version published to 10.1101/2025.09.29.679154 on bioRxiv
Oct 1, 2025

An Advanced Entropy Approach for Minimizing False Discoveries in Imputation-Based Association Analyses

This article has 4 authors:
1. Zhihui Zhang
2. Dakai Zhu
3. Xiangjun Xiao
4. Christopher I. Amos
This article has no evaluationsLatest version Dec 17, 2025
Application of longitudinal follow-up data increases power in the identification of genetic loci for type 2 diabetes

This article has 1 author:
1. Seong Beom Cho
This article has no evaluationsLatest version Dec 18, 2025
Causal effect heterogeneity estimation using summary statistics

This article has 8 authors:
1. Xingjie Shi
2. Yadong Yang
3. Minxi Bai
4. Jiacheng Miao
5. Stephen Dorn
6. Jonathan Haugstad
7. Jin Liu
8. Qiongshi Lu
This article has no evaluationsLatest version Jan 14, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

An Advanced Entropy Approach for Minimizing False Discoveries in Imputation-Based Association Analyses

Application of longitudinal follow-up data increases power in the identification of genetic loci for type 2 diabetes

Causal effect heterogeneity estimation using summary statistics