Network Construction Using Sparse Gaussian Graphical Model Based on GWAS Summary Statistics
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In genome-wide association studies (GWAS), thousands of genetic variants are tested to identify the association of genetic variants with a phenotype. GWAS have identified many strongly associated genetic variants with phenotypes and have greatly enhanced our understanding of the genetic architecture of complex phenotypes and diseases. Joint analysis of multiple phenotypes can increase the overall statistical power to detect genetic associations and allow for the identification of pleiotropic loci. A phenotype-phenotype network (PPN) represents phenotypes as nodes and the relationship between them as edges, which allows for the visualization of complex relationships between phenotypes, making it easier to identify clusters and help us intuitively grasp how different phenotypes are related. In this study, we proposed a new method to construct a PPN using the sparse Gaussian Graphical Model (sGGM) based on GWAS summary statistics. This approach will isolate the direct relationship between phenotypes, making it easier to identify clusters of phenotypes conditional on other phenotypes that reflect the more meaningful biological or functional connections. Then we applied the community detection method to partition phenotypes into disjoint modules based on the partial correlation matrix of phenotypes. For each module, various multiple phenotype association tests can be employed to test the association between a SNP and phenotypes in that module. We conducted a comprehensive simulation study to compare the performance of several multiple phenotype association tests using sGGM, network modules obtained from the correlation matrix of phenotypes, and those based on all phenotypes, respectively. The simulation results demonstrated that most of the multiple phenotype association tests based on network modules from sGGM not only effectively control the Type I error rates but also exhibit higher power compared to network modules derived from the correlation matrix and association tests on all phenotypes without modular segmentation. We applied this method to the GWAS summary statistics of 92 phenotypes from chapter IX of the UK Biobank. The results showed that applying the multiple phenotype association tests using the network module from sGGM detects more significant SNPs than that of using the network module from the correlation matrix.