Balancing selection on genomic deletion polymorphisms in humans

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    Detecting and quantifying balancing selection is a notoriously difficult challenge. In this study, the authors use both empirical analyses and simulations to characterize the amount of balancing selection in the human genome, focusing specifically on the contribution of polymorphic deletions. These results will be of interest to population and human geneticists. Although the presented evidence supports some degree of balancing selection among shared ancient polymorphisms, these findings primarily rely on the elimination of alternative explanations rather than a direct estimation of the extent of balancing selection. The conclusions are also based on simulations of a single demographic model without testing the robustness to other plausible model parameters.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

This article has been Reviewed by the following groups

Read the full article

Abstract

A key question in biology is why genomic variation persists in a population for extended periods. Recent studies have identified examples of genomic deletions that have remained polymorphic in the human lineage for hundreds of millennia, ostensibly owing to balancing selection. Nevertheless, genome-wide investigation of ancient and possibly adaptive deletions remains an imperative exercise. Here, we demonstrate an excess of polymorphisms in present-day humans that predate the modern human-Neanderthal split (ancient polymorphisms), which cannot be explained solely by selectively neutral scenarios. We analyze the adaptive mechanisms that underlie this excess in deletion polymorphisms. Using a previously published measure of balancing selection, we show that this excess of ancient deletions is largely owing to balancing selection. Based on the absence of signatures of overdominance, we conclude that it is a rare mode of balancing selection among ancient deletions. Instead, more complex scenarios involving spatially and temporally variable selective pressures are likely more common mechanisms. Our results suggest that balancing selection resulted in ancient deletions harboring disproportionately more exonic variants with GWAS (genome-wide association studies) associations. We further found that ancient deletions are significantly enriched for traits related to metabolism and immunity. As a by-product of our analysis, we show that deletions are, on average, more deleterious than single nucleotide variants. We can now argue that not only is a vast majority of common variants shared among human populations, but a considerable portion of biologically relevant variants has been segregating among our ancestors for hundreds of thousands, if not millions, of years.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    Detecting and quantifying balancing selection is a notoriously difficult challenge. Because the distribution of times to fixation or removal of strictly neutral variants has a long tail, it can be hard to exclude the null hypothesis of neutrality when testing for balancing selection that was not established so long ago that trans-specific variants can be observed. As Aqil et al. point out, most efforts to detect balancing selection in the human genome have been focused on single nucleotide variants. The authors seek to characterize the amount of balancing selection specific for polymorphic deletions. The authors justify their focus based on the fact that deletions are more likely to have functional consequences than single nucleotide variants, making it more likely that if they have remained for many generations, this could be a signature of balancing selection. That said, multiple aspects of the analysis deserve more attention.

    I have two broad concerns about the manuscript that the authors need to address. First, the authors use neutral simulations to exclude that neutrality alone can explain the amount of allele sharing observed between African modern humans and the archaic genomes. My concern is that human demography models, including the one from Gravel et al. (2011) used by the author are always simplifications of the complex demographic events that shaped human populations during evolution. In the case of the specific model used by the authors, African populations were inferred by the Gravel et al. model to have a constant population size for the past ~150,000 years (parameters Taf and Naf in the original model). This is an unrealistic assumption of this model. In brief, I am wondering how much the claim of the authors that neutrality alone cannot explain patterns of allele sharing is potentially based on mis-specifications of the neutral demography model. For example, the more fine scale fluctuations of effective population sizes in Africa inferred by author L. Speidel in 2019 Nature (Figure 3) paint a different picture than the Gravel et al. model. The authors need to run extensive testing of the robustness of their conclusions to changes in the neutral demographic model used. What if the average ancestral population size was closer to 20,000? What if it was closer to 50,000 and frequency fluctuations every generation were smaller? Given how uncertain past population sizes really were and the current uncertainties about demographic reconstruction in particular relative to linked selection, the authors need to explore a range of past population size beyond the idiosyncrasies of a specific model.

    These are great suggestions. Based on them, we now conducted 37 additional simulations with different sets of parameters, including adding the Speidel et al. model to the mix (the new Figure 1C). As discussed above (please refer to our response to the general reviews) and in the Results section, realistic neutral scenarios cannot explain the excess allele sharing.

    My second broad concern is that it is difficult to evaluate how novel the findings really are. It is true that the authors focus on deletions while pasts scans for balancing selection in the human genome focused on SNVs. But it could be the case that a substantial number of the deletions identified here as under balancing selection could have previously been identified as such loci through linked SNVs by the scans cited by the authors. The authors need to provide quantification of how many of their deletions are truly novel balancing selection loci as opposed to balancing selection loci already identified through linked SNVs.

    The reviewer is right. We now compared our results with previous genome-wide studies, which have been notoriously inconsistent with each other. We found that virtually all of our candidates are novel, as described in our response to the general reviews and our Results section.

    The novelty of the balanced deletions will also be better established by providing a more quantitative and less anecdotal functional analysis. It is true that the deletions include immune loci, but are they statistically enriched for immune loci as annotated for example by Gene Ontology, in a way that shows that their distribution across the genome is not random but indeed driven by selection enriching them at loci with specific functions? In addition, do the pie charts in Figure 5E, represent a statistically significant deviation from left to right or not?

    We appreciate the reviewers’ suggestions, which led us to conduct a series of very fruitful analyses. As discussed above, we now found that ancient deletions are significantly more likely to have GWAS traits and be exonic (Figure 5B) and significantly more likely to affect immunity, blood, and metabolism-related traits (Figure 5C). Moreover, we found that ancient deletions are depleted for smaller size categories but show significant enrichment for the sizes 95th percentile and above (Figure 7A). We now discussed these findings in the Results section.

    Reviewer #2 (Public Review):

    The authors assess evidence for balancing selection by looking at old polymorphisms where the derived allele is shared by descent with archaic humans, meaning the polymorphism must predate this split. Using simulations and several features of these old polymorphisms, they evaluate whether and what signatures of balancing selection are enriched in these polymorphisms. This is a well-explained and thorough analysis, and a clever way to approach a difficult question, yet the analysis remains fairly descriptive and the claims that can be made are not strong. For instance, the title of the paper does not state a particular finding of balancing selection, and several claims are "may" such as "A significant portion of ancient polymorphisms may have evolved under medium-term balancing selection" and "These results suggest that at least 27% of common functional deletion polymorphisms may have been evolving under balancing selection".

    We thank the reviewer for their insights. We agree that balancing selection is a difficult to elucidate definitively. However, in our revisions, we have conducted several additional analyses based on reviewers’ suggestions as discussed under individual comments. We believe that these analyses strengthen our claims.

    First, using simulations, they show there are more such ancient nonsynonymous and (indirectly) deletion variants than expected under a simple neutral model. The enrichment is nominal when compared only with Denisovan sharing, which could be explained due to some superarchaic ancestry in Denisovans (though not clear if that holds up quantitatively). The classification of the shared polymorphisms as recurrent, recently introgressed, or ancient shared by descent could be more carefully tested. In particular, I'm concerned about the possible inclusion of recurrent mutations among the ancient set. Although the age trend is consistent, it does not indicate how much misclassification there might still be. For example, there are "ancient" deletions that have inferred ages more recent than the human-archaic split (shown in Fig. 3).

    We agree that recurrent mutations are crucial to discriminate from the ancient ones in our analysis. We have now conducted additional analysis of allele frequency and CG content to further test potential recurrent mutations in our datasets as described in our response to general reviews. We described these in our Results section and Figure S1. In addition, we actually conducted even more stringent filtering requiring perfect LD and found that this increased stringency did not affect our results substantially. Thus, we believe that our pipeline identifies ancient deletions very conservatively and likely harbors a considerable number of false negatives, where ancient deletions are categorized as recurrent.

    The reviewer’s observation that some ancient deletions have recent dates is indeed interesting. The dating of individual alleles assumes neutrality and broadly depends on haplotype length and allele frequency. We believe that given the potential soft sweeps acting on these deletions, it is possible that the dates may be biased in some cases. For example, if there is a recent sweep on an ancient deletion, this may lead to longer haplotype lengths and, thus, a more recent date for these alleles. Therefore, the ancient derived alleles (those that are shared with archaic hominins) which happen to have recent allele dates may be of particular interest for future scrutiny. We now discuss this particular issue further in the Results section as follows:

    “Counterintuitively, some “ancient” deletions have very recent dates. This may be due to instances of recent soft sweeps involving some deletions leading to an increased length of the associated haplotype and an artificial decrease in age. Secondly, some ancient deletions may have low frequencies, which too creates a downward bias in age. Lastly, this may be due to rare instances of miscategorization of non-ancient deletions as ancient.”

    For the rest of the paper, the authors then focus on the deletion variants, showing that these ancient deletions show an elevated signature of balancing selection (stdbeta2) but do not show less variance in allele frequency over time as would be expected under an overdominance model. They infer the mechanism to be spatial or temporal variation in selection or negative frequency-dependent selection by process of elimination. They identify the subset of ancient deletion polymorphisms that overlap exons and are associated with phenotypes, finding a high proportion of ancient deletions that fall in both these categories. The identification of this set of potentially causal deletions that may be under balancing selection is a set that is of interest to the wider community for follow up (though several have already been the subject of study and individual publications from this lab). Overall, this is a useful combination of simulation work and assessment of an intriguing set of old deletion polymorphisms. Put together, the analysis does support evidence of balancing selection on some of them, but the extent is still not clear.

    We thank the reviewer. To further determine the extent of balancing selection acting on these ancient deletions, we conducted several enrichment analyses described above (please refer to our response to the general reviews) and in the paper. Briefly, we now added Figures 5B, 5C and 7A to describe these new analyses.

  2. Evaluation Summary:

    Detecting and quantifying balancing selection is a notoriously difficult challenge. In this study, the authors use both empirical analyses and simulations to characterize the amount of balancing selection in the human genome, focusing specifically on the contribution of polymorphic deletions. These results will be of interest to population and human geneticists. Although the presented evidence supports some degree of balancing selection among shared ancient polymorphisms, these findings primarily rely on the elimination of alternative explanations rather than a direct estimation of the extent of balancing selection. The conclusions are also based on simulations of a single demographic model without testing the robustness to other plausible model parameters.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

  3. Reviewer #1 (Public Review):

    Detecting and quantifying balancing selection is a notoriously difficult challenge. Because the distribution of times to fixation or removal of strictly neutral variants has a long tail, it can be hard to exclude the null hypothesis of neutrality when testing for balancing selection that was not established so long ago that trans-specific variants can be observed. As Aqil et al. point out, most efforts to detect balancing selection in the human genome have been focused on single nucleotide variants. The authors seek to characterize the amount of balancing selection specific for polymorphic deletions. The authors justify their focus based on the fact that deletions are more likely to have functional consequences than single nucleotide variants, making it more likely that if they have remained for many generations, this could be a signature of balancing selection. That said multiple aspects of the analysis deserve more attention.

    I have two broad concerns about the manuscript that the authors need to address. First, the authors use neutral simulations to exclude that neutrality alone can explain the amount of allele sharing observed between African modern humans and the archaic genomes. My concern is that human demography models, including the one from Gravel et al. (2011) used by the author are always simplifications of the complex demographic events that shaped human populations during evolution. In the case of the specific model used by the authors, African populations were inferred by the Gravel et al. model to have a constant population size for the past ~150,000 years (parameters Taf and Naf in the original model). This is an unrealistic assumption of this model. In brief, I am wondering how much the claim of the authors that neutrality alone cannot explain patterns of allele sharing is potentially based on mis-specifications of the neutral demography model. For example, the more fine scale fluctuations of effective population sizes in Africa inferred by author L. Speidel in 2019 Nature (Figure 3) paint a different picture than the Gravel et al. model. The authors need to run extensive testing of the robustness of their conclusions to changes in the neutral demographic model used. What if the average ancestral population size was closer to 20,000? What if it was closer to 50,000 and frequency fluctuations every generation were smaller? Given how uncertain past population sizes really were and the current uncertainties about demographic reconstruction in particular relative to linked selection, the authors need to explore a range of past populations size beyond the idiosyncrasies of a specific model.

    My second broad concern is that it is difficult to evaluate how novel the findings really are. It is true that the authors focus on deletions while pasts scans for balancing selection in the human genome focused on SNVs. But it could be the case that a substantial number of the deletions identified here as under balancing selection could have previously identified as such loci through linked SNVs by the scans cited by the authors. The authors need to provide quantification of how many of their deletions are truly novel balancing selection loci as opposed to balancing selection loci already identified through linked SNVs.
    The novelty of the balanced deletions will also be better established by providing a more quantitative and less anecdotal functional analysis. It is true that the deletions include immune loci, but are they statistically enriched for immune loci as annotated for example by Gene Ontology, in a way that shows that their distribution across the genome is not random but indeed driven by selection enriching them at loci with specific functions? In addition, do the pie charts in Figure 5E, represent a statistically significant deviation from left to right or not?

  4. Reviewer #2 (Public Review):

    The authors assess evidence for balancing selection by looking at old polymorphisms where the derived allele is shared by descent with archaic humans, meaning the polymorphism must predate this split. Using simulations and several features of these old polymorphisms, they evaluate whether and what signatures of balancing selection are enriched in these polymorphisms. This is a well-explained and thorough analysis, and a clever way to approach a difficult question, yet the analysis remains fairly descriptive and the claims that can be made are not strong. For instance, the title of the paper does not state a particular finding of balancing selection, and several claims are "may" such as "A significant portion of ancient polymorphisms may have evolved under medium-term balancing selection" and "These results suggest that at least 27% of common functional deletion polymorphisms may have been evolving under balancing selection".

    First, using simulations, they show there are more such ancient nonsynonymous and (indirectly) deletion variants than expected under a simple neutral model. The enrichment is nominal when compared only with Denisovan sharing, which could be explained due to some superarchaic ancestry in Denisovans (though not clear if that holds up quantitatively). The classification of the shared polymorphisms as recurrent, recently introgressed, or ancient shared by descent could be more carefully tested. In particular, I'm concerned about the possible inclusion of recurrent mutations among the ancient set. Although the age trend is consistent, it does not indicate how much misclassification there might still be. For example, there are "ancient" deletions that have inferred ages must more recent than the human-archaic split (shown in Fig. 3).

    For the rest of the paper, the authors then focus on the deletion variants, showing that these ancient deletions show an elevated signature of balancing selection (stdbeta2) but do not show less variance in allele frequency over time as would be expected under an overdominance model. They infer the mechanism to be spatial or temporal variation in selection or negative frequency dependent selection by process of elimination. They identify the subset of ancient deletion polymorphisms that overlap exons and are associated with phenotypes, finding a high proportion of ancient deletions that fall in both these categories. The identification of this set of potentially causal deletions that may be under balancing selection is a set that is of interest to the wider community for follow up (though several have already been the subject of study and individual publications from this lab). Overall, this is a useful combination of simulation work and assessment of an intriguing set of old deletion polymorphisms. Put together, the analysis does support evidence of balancing selection on some of them, but the extent is still not clear.