Optimizing Genetic Ancestry Adjustment in DNA Methylation Studies: A Comparative Analysis of Approaches

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Genetic ancestry is an important factor to account for in DNA methylation studies because genetic variation influences DNA methylation patterns. One approach uses principal components (PCs) calculated from CpG sites that overlap with common SNPs to adjust for ancestry when genotyping data is not available. However, this method does not remove technical and biological variations, such as sex and age, prior to calculating the PCs. The first PC is therefore often associated with factors other than ancestry. Methods We developed and adapted the adapted EpiAnceR +  approach, which includes 1) residualizing the CpG data overlapping with common SNPs for control probe PCs, sex, age, and cell type proportions to remove the effects of technical and biological factors, and 2) integrating the residualized data with genotype calls from the SNP probes (commonly referred to as rs probes) present on the arrays, before calculating PCs and evaluated the clustering ability and relationship to genetic ancestry. Results The PCs generated by EpiAnceR +  led to improved clustering for repeated samples from the same individual and stronger associations with genetic ancestry groups predicted from genotype information compared to the original approach. Conclusions We show that the EpiAnceR +  approach improves the adjustment for genetic ancestry in DNA methylation studies. EpiAnceR +  can be integrated into existing R pipelines for commercial methylation arrays, such as 450K, EPICv1, and EPICv2. The code is available on GitHub (https://github.com/KiraHoeffler/EpiAnceR).

Article activity feed