LD Matrix Approximations for Scalable Analysis of High-dimensional Genetic Data

Ulises Bercovich
Shadi Zabad
Simon Gravel

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Linkage disequilibrium (LD) matrices are an essential part of many statistical genetics methods. However, their high dimensionality makes their computation and storage impractical for large genomic data. Common sparse approximations, such as banded matrices, come at the expense of losing the positive semi-definite (PSD) property, a critical quality that ensures numerical stability of many downstream analyses. Conversely, methods that guarantee a PSD approximation, like block-diagonal approaches, require coarse approximations of the LD structure. In this work, we present a novel method to approximate an LD matrix with a sparse, banded matrix that is guaranteed to be PSD while preserving the correlation structure within the band. This is done via a reformulation of the nearest correlation matrix problem using the Cholesky decomposition, which implicitly imposes the PSD property in a highly scalable parallel approach. On whole-chromosome data from the 1000 Genomes Project and the UK Biobank, our method builds sparse positive semi-definiteness that are more more accurate than either block-diagonal or shrinkage estimators.

Version published to 10.1101/2025.09.16.676478 on bioRxiv
Sep 18, 2025

Reframing Population Genetic Structure as a Quantum Optimization Problem

This article has 1 author:
1. Andrew Davinack
This article has no evaluationsLatest version Dec 24, 2025
ST-LDAW: A Topic-Model and Damped Weighted Least-Squares Method for Integrative Deconvolution of Single-Cell and Spatial Transcriptomics

This article has 8 authors:
1. Xiaoyang Wang
2. Dongmei Ai
3. Li C. Xia
4. HuiLing Liu
5. Lulu Chen
6. Zhimin Li
7. Yang Du
8. Yujia Li
This article has no evaluationsLatest version Jan 13, 2026
Approximating prediction error variances and reliabilities in multiple-trait genomic prediction model using Monte Carlo sampling

This article has 5 authors:
1. Antero Heikkilä
2. Ismo Strandèn
3. Martin Lidauer
4. Klaus Nordhausen
5. Sara Taskinen
This article has no evaluationsLatest version Dec 15, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Reframing Population Genetic Structure as a Quantum Optimization Problem

ST-LDAW: A Topic-Model and Damped Weighted Least-Squares Method for Integrative Deconvolution of Single-Cell and Spatial Transcriptomics

Approximating prediction error variances and reliabilities in multiple-trait genomic prediction model using Monte Carlo sampling