Removing array-specific batch effects in GWAS mega-analyses by applying a two-step imputation workflow reveals new associations for thyroid volume and goiter

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Combining individual-level data in genetic association studies (mega-analyses) enhances statistical power for identifying gene-trait associations. However, batch effects from combining variants of different arrays pose a major limitation. Here, we developed a two-step imputation workflow to overcome the array type bias.

Methods

Genotype data of 10,647 individuals generated using five different arrays were included. Intermediate array-specific panels were generated and subsequently imputed against the 1000 Genomes Project Phase3 reference panel. Genetic principal component (PC) analysis assessed batch effects in the cohort-combined imputed data. The workflow’s performance was evaluated by comparing imputation quality r 2 and allele frequency difference of the proposed two-step imputation to the conventional array-specific imputation as well as its matching with a whole-genome sequenced subgroup for further validation. We performed a genome-wide association study (GWAS) to test for genetic associations with goiter risk and thyroid gland volume, comparing summary statistics of both approaches.

Results

The proposed workflow eliminated the batch effect from the first twenty genetic PCs. The outcome of the workflow also showed high correlation with the conventional approach for allele frequencies (r 2 > 0.99). GWAS results from the two-step imputation confirmed known associations on thyroid traits and revealed novel loci for thyroid volume ( TG, PAX8, IGFBP5, NRG1 ), and one novel locus for goiter ( XKR6 ), which was not statistically significant following the GWAS meta-analysis of conventional imputation.

Conclusion

Our imputation workflow provides high-quality imputation results without technical batch effects, fostering mega-analysis involving multiple genotyping arrays for different genetic association analysis.

Article activity feed