Rare-variant aggregate association analysis using imputed data is a powerful approach

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Imputation can cost-effectively generate genotypes for millions of variants. The power of performing rare variant aggregate association tests using imputed genotypes was evaluated. White Europeans from the UK Biobank with exome sequence and genotype array data were analyzed. Using the genotype type array data from the UK Biobank, imputation was performed using the HRC r1.1 (N=64,976 Haplotypes) and TOPMed r3 (N=267,194 haplotypes) reference panels. Simulations were used to compare the power of performing rare variant aggregate association analysis using sequence and imputed data. Additionally, the number of genes with > 2 rare variants (missense, nonsense, splice site) was approximately the same for exome sequence and TOPMed imputed data, but HRC imputed data had ∼10% fewer genes. For the imputed data, using a less stringent R2 threshold (i.e., >0.3 vs. >0.8) led to greater power to detect aggregate associations due to additional rare variants included in the test. Exome sequence data provided the highest power for rare variant aggregate association testing, with TOPMed imputed variants usually having less than a 20% reduction in power. HRC imputed variants provided substantially less power. We also performed rare variant aggregate association analyses using UK Biobank phenotype, exome sequence data, and imputed variants for PCSK9 and low-density lipoprotein (N=159,904 study subjects) and APOC3 and triglyceride levels (N=160,036 study subjects). For these analyses, even when ultra-rare variants (minor allele frequency<0.001) were analyzed, significant aggregate associations could be detected for the exome sequence data and TOPMed and HRC imputed variants. Although there is a decrease in power for rare variant aggregate association tests when analyzing imputed variants compared to sequence data due to missing variants and uncertainty in genotypes, analysis of imputed data is a viable approach to detect rare variant aggregate associations when sequence data is unavailable.

Article activity feed