Impact of sample size and tissue relevance on T2D gene identification
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Identification of genes and proteins mediating the activity of GWAS variants requires molecular data from disease relevant tissues, but these may be difficult to collect. Using multiple gene expression reference datasets and GWAS summary statistics for T2D we identified 1,818 unique genes associated with T2D. Comparing the performance of different reference datasets, we found that sample size, and not the relevance of the tissue to the disease, was the critical factor in identifying relevant genes. Genes implicated using a well powered expression dataset were also more likely to have multiple lines of genetic evidence. A targeted proteomics reference dataset from plasma samples showed similar power to identify T2D related proteins as gene expression with the same sample size. Accounting for BMI reduces power across all tissues and phenotypes by ∼30%, suggesting that many GWAS links to T2D are mediated by BMI, potentially implicating insulin resistance related effects. Finally, using data from smaller GWAS studies with precisely defined T2D subtypes uncovers genes directly relevant to that subtype, such as LST1 , an immune response gene for Severe Autoimmune Diabetes and TRMT2A , involved in beta-cell apoptosis, for Severe Insulin Deficient Diabetes. Our work demonstrates the benefits of well powered reference datasets in accessible tissues and well-defined disease subtypes when studying complex diseases involving multiple tissues.