Patterns of population structure and genetic variation within the Saudi Arabian population
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The Arabian Peninsula is considered the initial site of historic human migration out of Africa. The modern-day indigenous Arabians are believed to be the descendants who remained from the ancient split of the migrants into Eurasia. Here, we investigated how the population history and cultural practices such as endogamy have shaped the genetic variation of the Saudi Arabians. We genotyped 3,352 individuals and identified twelve genetic sub-clusters that corresponded to the geographical distribution of different tribal regions, differentiated by distinct components of ancestry based on comparisons to modern and ancient DNA references. These sub-clusters also showed variation across ranges of the genome covered in runs of homozygosity, as well as differences in population size changes over time. Using 25,488,981 variants found in whole genome sequencing data (WGS) from 302 individuals, we found that the Saudi tend to show proportionally more deleterious alleles than neutral alleles when compared to Africans/African Americans from gnomAD (e.g. a 13% increase of deleterious alleles annotated by AlphaMissense between 0.5 -5% frequency in Saudi, compared to 7% decrease of the benign alleles; P < 0.001). Saudi sub-clusters with greater inbreeding and lower effective population sizes showed greater enrichment of deleterious alleles as well. Additionally, we found that approximately 10% of the variants discovered in our WGS data are not observed in gnomAD; these variants are also enriched with deleterious annotations. To accelerate studying the population-enriched deleterious alleles and their health consequences in this population, we made available the allele frequency estimates of 25,488,981 variants discovered in our samples. Taken together, our results suggest that Saudi’s population history impacts its pattern of genetic variation with potential consequences to the population health. It further highlights the need to sequence diverse and unique populations so to provide a foundation on which to interpret medical-and pharmaco-genomic findings from these populations.