Sparse Polygenic Risk Score Inference with the Spike-and-Slab LASSO

Junyi Song
Shadi Zabad
Archer Yang
Simon Gravel
Yue Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large-scale biobanks, with comprehensive phenotypic and genomic data across hundreds of thousands of samples, provide ample opportunities to elucidate the genetics of complex traits and diseases. Consequently, there is a growing demand for robust and scalable methods for disease risk prediction from genotype data. Performing inference in this setting is challenging due to the high-dimensionality of genomic data, especially when coupled with relatively smaller sample sizes. Popular Polygenic Risk Score (PRS) inference methods address this challenge by adopting sparse Bayesian priors or penalized regression techniques, such as the Least Absolute Shrinkage and Selection Operator ( LASSO ). However, the former class of methods are not as scalable and do not produce exact sparsity, while the latter tends to over-shrink large coefficients. In this study, we present SSLPRS , a novel PRS method based on the Spike-and-Slab LASSO (SSL) prior, which offers a theoretical bridge between the two frameworks. We extend previous work to derive a coordinate-ascent inference algorithm that operates on GWAS summary statistics, which is orders-of-magnitude more efficient than corresponding individual-level-based implementations. To illustrate the statistical properties of the proposed model, we conducted experiments involving 9 simulation configurations and 9 quantitative phenotypes from the UK Biobank. Our results demonstrate that SSLPRS is competitive with state-of-the-art methods in terms of prediction accuracy and exhibits superior variable selection performance, especially in sparse genetic architectures. In simulations, this translates to upwards of 50% improvement in positive predictive value. In analysis of real phenotypes, we show that selected variants are highly enriched for meaningful genomic annotations and have better replication rates in larger meta-analyses.

Version published to 10.1101/2025.01.28.25321292 on medRxiv
Jan 29, 2025

An Advanced Entropy Approach for Minimizing False Discoveries in Imputation-Based Association Analyses

This article has 4 authors:
1. Zhihui Zhang
2. Dakai Zhu
3. Xiangjun Xiao
4. Christopher I. Amos
This article has no evaluationsLatest version Dec 17, 2025
Nonparametric Learning of Covariate-based Markov Jump Processes Using RKHS Techniques

This article has 3 authors:
1. yuchen han
2. Riten Mitra
3. Arnab Ganguly
This article has no evaluationsLatest version Dec 17, 2025
Application of longitudinal follow-up data increases power in the identification of genetic loci for type 2 diabetes

This article has 1 author:
1. Seong Beom Cho
This article has no evaluationsLatest version Dec 18, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

An Advanced Entropy Approach for Minimizing False Discoveries in Imputation-Based Association Analyses

Nonparametric Learning of Covariate-based Markov Jump Processes Using RKHS Techniques

Application of longitudinal follow-up data increases power in the identification of genetic loci for type 2 diabetes