Leveraging cis- and trans- variants to improve protein expression level prediction for proteome-wide association studies
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Since genetic effects are often mediated through proteins, the analysis of proteomic data can provide insights into disease etiology. However, most studies lack proteomic data. To address this problem, we developed TransCisPredict to perform proteome-wide association studies (PWAS) at a biobank scale. TransCisPredict reduces computational burden through linkage-disequilibrium block selection which facilitates incorporating cis- and trans- variants to predict protein expression and performs protein-phenotype association analyses. To account for differences in protein regulatory architecture, four prediction methods are used for weight estimation, i.e., BayesR, Elastic Net, LASSO, and SuSiE. Five-fold cross-validation (CV) is used to select the optimal method for each protein. Weight estimation was performed using White British UK Biobank study subjects (N=42,644) with proteomic and genotype array data. Of the 2,920 available protein expression levels, 2,339 could be predicted with a CV-R 2 >0.05 when cis- and trans- variants were used. Since most methods are limited to cis- variation, for comparison only cis -variants were used for prediction yielding 466 proteins with a CV-R 2 >0.05. A PWAS was performed for 2,339 predicted protein expression levels and type 2 diabetes (T2D) using White British UK Biobank study subjects without proteomic data (N=364,132) followed by two-sample Mendelian randomization using a method that controls for horizontal pleiotropy for validation. Forty proteins were associated with T2D and validated. For the 466 cis- only predicted protein expression levels, three proteins were associated with T2D and validated. Incorporating both cis- and trans- variation using TransCisPredict facilitates the prediction of many more proteins compared to using cis- only variants thereby increasing the power of PWAS.