Optimizing genetic ancestry adjustment in DNA methylation studies: a comparative analysis of approaches
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Genetic ancestry is an important factor to account for in DNA methylation studies because genetic variation influences DNA methylation patterns. One approach uses principal components (PCs) calculated from CpG sites that overlap with common SNPs to adjust for ancestry when genotyping data is not available. However, this method does not remove technical and biological variations, such as sex and age, prior to calculating the PCs. The first PC is therefore often associated with factors other than ancestry.
Methods
We developed and adapted the adapted EpiAnceR+ approach, which includes (1) residualizing the CpG data overlapping with common SNPs for control probe PCs, sex, age, and cell type proportions to remove the effects of technical and biological factors, and (2) integrating the residualized data with genotype calls from the SNP probes (commonly referred to as rs probes) present on the arrays, before calculating PCs and evaluated the clustering ability and relationship to genetic ancestry.
Results
The PCs generated by EpiAnceR+ led to improved clustering for repeated samples from the same individual and stronger associations with genetic ancestry groups predicted from genotype information compared to the original approach. EpiAnceR+ also outperformed the use of DNA methylation PCs or surrogate variables for ancestry adjustment.
Conclusions
We show that the EpiAnceR + approach improves the adjustment for genetic ancestry in DNA methylation studies. EpiAnceR+ can be integrated into existing R pipelines for commercial methylation arrays, such as 450 K, EPIC v1, and EPIC v2. The code is available on GitHub ( https://github.com/KiraHoeffler/EpiAnceR ).