LoGicAl: Local Ancestry and Genotype Calling Uncertainty-aware Ancestry-specific Allele Frequency Estimation from Admixed Samples
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In genetic epidemiology and population genetic research of admixed populations, the allele frequencies across ancestral contributing populations, i.e., ancestry-specific allele frequencies, are a fundamental parameter, informing ancestry-enriched genetic drivers of disease etiologies, improving GWAS replication cohort design, enhancing polygenic risk prediction and portability, and providing insights into their demographic history. Current methods for estimating ancestry-specific allele frequencies typically rely on "best-guess" ancestry and genotype calls, thereby ignoring uncertainty from these upstream ancestry calling and genotyping procedures. Here, we introduce LoGicAl, a novel method for estimating ancestry-specific allele frequencies via accelerated expectation-maximization algorithm while simultaneously accounting for uncertainty from ancestry calling, genotyping, and statistical phasing to fine-tune its estimates. Simulation and real data applications demonstrate that ignoring these uncertainties inflates bias in allele frequency estimates and show that LoGicAl has superior accuracy and scalability in leveraging sequencing and array genotyping with different levels of local ancestry inference quality. Thus, LoGicAl can facilitate genomic analyses of admixed populations from biobank-scale data by providing precise ancestry-specific allele frequency estimates, which promote understandings of the landscape and dynamics of genetic variations in admixed populations at a finer scale.