Sparse multitask group Lasso for genome-wide association studies

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

A critical hurdle in Genome-Wide Association Studies (GWAS) involves population stratification, wherein differences in allele frequencies among subpopulations within samples are influenced by distinct ancestry. This stratification implies that risk variants may be distinct across populations with different allele frequencies. This study introduces Sparse Multitask Group Lasso (SMuGLasso) to tackle this challenge. SMuGLasso is based on MuGLasso, which formulates this problem using a multitask group lasso framework in which tasks are subpopulations, and groups are population-specific Linkage-Disequilibrium (LD)-groups of strongly correlated Single Nucleotide Polymorphisms (SNPs). The novelty in SMuGLasso is the incorporation of an additional ℓ 1 -norm regularization for the selection of population-specific genetic variants. As MuGLasso, SMuGLasso uses a stability selection procedure to improve robustness and gap-safe screening rules for computational efficiency. We evaluate MuGLasso and SMuGLasso on simulated data sets as well as on a case-control breast cancer data set and a quantitative GWAS in Arabidopsis thaliana . We show that SMuGLasso is well suited to addressing linkage disequilibrium and population stratification in GWAS data, and show the superiority of SMuGLasso over MuGLasso in identifying population-specific SNPs. On real data, we confirm the relevance of the identified loci through pathway and network analysis, and observe that the findings of SMuGLasso are more consistent with the literature than those of MuGLasso. All in all, SMuGLasso is a promising tool for analyzing GWAS data and furthering our understanding of population-specific biological mechanisms.

Article activity feed