Old vs. new local ancestry inference in HCHS/SOL: a comparative study
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Hispanic/Latino populations are admixed, with genetic contributions from multiple ancestral populations. To uncover genetic associations in these populations, researchers often turn to admixture mapping, which relies on inferred counts of “local” ancestry, i.e. the source ancestral population at a locus. Local ancestries are inferred using external reference panels that represent ancestral populations, making the choice of inference method and reference panel critical. This study used a dataset of Hispanic/Latino individuals from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) to evaluate how updates in local ancestry inference (LAI) affect results, specifically, the ‘old’ LAI performed using a popular inference method RFMix alongside ‘new’ inferences performed using Fast Local Ancestry Estimation (FLARE) with an updated reference panel. We compared their performance in terms of global and local ancestry correlations, as well as admixture mapping-based associations. Overall, the old and new inferences produced highly similar global and local ancestry estimates, with FLARE-based results closely matching those from RFMix in admixture mapping analyses. However, in some genomic regions, the old and new local ancestries showed relatively lower correlations (Pearson R < 0.9). Most of these regions (86.42%) were mapped to either ENCODE blacklist regions or gene clusters, compared to 7.67% of randomly-matched regions with high correlations (Pearson R > 0.97). These findings show that old and new inferences largely agree and suggest that regions of lower agreement are mostly due to genomic sequence contexts that lead to less stable inference, rather than due to the LAI software or genotyping technology used.