Convex approaches to isolate the shared and distinct genetic structures of subphenotypes in heterogeneous complex traits
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Groups of complex diseases, such as coronary heart diseases, neuropsychiatric disorders, and cancers, often display overlapping clinical symptoms and pharmacological treatments. The shared associations of genetic variants across diseases has the potential to explain their underlying biological processes, but this remains poorly understood. To address this, we model the matrix of summary statistics of trait-associated genetic variants as the sum of a low-rank component – representing shared biological processes – and a sparse component, representing unique processes and arbitrarily corrupted or contaminated components. We introduce Clorinn , an open-source Python software that uses convex optimization algorithms to recover these components by minimizing a weighted combination of the nuclear norm and of the L1 norm. Among others, Clorinn provides two significant benefits: (a) Convex optimization guarantees reproducibility of the components, and (b) The low-rank “uncor-rupted” matrix allows robust singular value decomposition (SVD) and principal component analysis (PCA), which are otherwise highly sensitive to outliers and noise in the input matrix. In extensive simulations, we observe that Clorinn outperforms state-of-the-art approaches in capturing the shared latent factors across phenotypes. We apply Clorinn to estimate 200 latent factors from GWAS summary data of 2,110 phenotypes measured in European-ancestry Pan-UK BioBank individuals ( N = 420,531) and 14 psychiatric disorders.