Using Variation Graphs to Enhance Knowledge of Genomic Medicine and Population Genetics in Negev Bedouin

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background The application of genomics in translational medicine has gained significant attention as a means to address health disparities. However, pangenomic approaches have been underutilised with most studies relying on the hg38 linear reference. This hinders human evolution studies in fully leveraging whole-genome data. We evaluate the dependability of a well-established variation graph construction algorithm for inferring effective population sizes (Ne) in the Bedouin against hg38, its effect on allele frequencies and potential impact on medical genetics studies, such as GWAS. Our goal is to apply these insights to characterise genetic variation in East African populations. Methods The variation graph was built using 14 Somali individuals from the Human Origins dataset, based on their ancestral affinity to East Africans. Ne was obtained using a coalescent approach ('Relate') for variants detected using the graph-based reference, and compared with the hg38-derived variant calls, for Bedouin individuals from the HGDP. The estimates' quality was evaluated through coalescent simulations. The Wilcoxon Rank Sum Test assessed allele frequency differences between graph-based and linear references, applied both at a genome-wide and exclusively for Bedouin-specific GWAS hits. Results Coalescent analyses revealed that graph-based variants yielded Ne ≈ 17, a 1000-fold decrease from Ne ≈ 79000 for the linear reference. Only the graph-based approach accurately estimated Ne within the 95% CI from simulations (16-19 vs. 324-3468 for hg38). Also, significantly lower rank sum statistics (P-value < 2.2 × 10^-16) were observed for variants from the hg38 reference compared to the graph alignment. Similarly, GWAS variants generated from the graph exhibited lower frequencies (p = 0.023) which could impact the power and interpretation of GWAS and other medical genetics analysis. Conclusion The pangenomic approach, employing variation graphs, more accurately infers Ne in the Bedouin population compared to the hg38 reference. Our findings contribute to the growing evidence demonstrating the impact of these results on genomic medicine as a whole, including GWAS, personalised medicine, and the understanding of genetic variations in diverse populations, which is crucial for equitable healthcare.

Article activity feed