Sparse Polygenic Risk Score Inference with the Spike-and-Slab LASSO
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large-scale biobanks, with comprehensive phenotypic and genomic data across hundreds of thousands of samples, provide amble opportunities to elucidate the genetics of complex traits and diseases. Consequently, there is a growing demand for robust and scalable methods for disease risk prediction from genotype data. Performing inference in this setting is challenging due to the high-dimensionality of genomic data, especially when coupled with relatively smaller sample sizes. Popular Polygenic Risk Score (PRS) inference methods address this challenge by adopting sparse Bayesian priors or penalized regression techniques, such as the Least Absolute Shrinkage and Selection Operator ( LASSO ). However, the former class of methods are not as scalable and do not produce exact sparsity, while the latter may over-shrink large coefficients. In this study, we present SSLPRS , a novel PRS method based on the Spike-and-Slab LASSO (SSL) prior, which offers a theoretical bridge between the two frameworks. We extend previous work to derive a coordinate-ascent inference algorithm for the SSLPRS model that operates on GWAS summary statistics. The software implementation, which uses state-of-the-art techniques to scale PRS inference to millions of genetic variants, has an average runtime of three minutes. To illustrate the statistical properties of the new model, we conducted experiments on nine quantitative phenotypes in the UK Biobank, which showed that SSLPRS is competitive with state-of-the-art methods in terms of prediction accuracy. The method also produces highly sparse effect size estimates without excessive shrinkage and, on average, selects 5% fewer variants than the LASSO .