Joint inference of evolutionary transitions to self-fertilization and demographic history using whole-genome sequences

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This manuscript details the valuable development of population genetics theory that can be used to infer past changes in the selfing rate in natural populations. The inference procedure is solid, although the comparison to previous estimates can be improved, and deeper insight could be gained from further theoretical exploration. The work will be of broad interest to the field of mating systems evolution.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The evolution from outcrossing to selfing occurred recently across the eukaryote tree of life in plants, animals, fungi, and algae. Despite short-term advantages, selfing is hypothetically an evolutionary dead-end reproductive strategy. The tippy distribution on phylogenies suggests that most selfing species are of recent origin. However, dating such transitions is challenging yet central for testing this hypothesis. We build on previous theories to disentangle the differential effect of past changes in selfing rate or from that of population size on recombination probability along the genome. This allowed us to develop two methods using full-genome polymorphisms to (1) test if a transition from outcrossing to selfing occurred and (2) infer its age. The teSMC and tsABC methods use a transition matrix summarizing the distribution of times to the most recent common ancestor along the genome to estimate changes in the ratio of population recombination and mutation rates overtime. First, we demonstrate that our methods distinguish between past changes in selfing rate and demographic history. Second, we assess the accuracy of our methods to infer transitions to selfing approximately up to 2.5 N e generations ago. Third, we demonstrate that our estimates are robust to the presence of purifying selection. Finally, as a proof of principle, we apply both methods to three Arabidopsis thaliana populations, revealing a transition to selfing approximately 600,000 years ago. Our methods pave the way for studying recent transitions to self-fertilization and better accounting for variation in mating systems in demographic inferences.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    The shift from outcrossing to selfing is one of the most prevalent evolutionary events in flowering plants. The ecological and genetic backgrounds of these transitions have been of major interest for decades, and one of the key questions was the dating of this transition. Timing of pseudogenization of the self-incompatibility (SI) genes has been used as a proxy for this transition because loss-of-function mutations of SI genes are often responsible for the evolution of predominant selfing. However, SI genes are identified only in a limited number of taxa, and in some cases, the evolution of selfing is not necessarily associated with loss of SI. Therefore, an independent time estimate of the evolution of selfing by genome-wide polymorphism data has been considered important in this field.

    This study provides two statistical methods: SMC-based and ABC-based methods. Both methods intend to detect the genome-wide signatures of the outcrossing-to-selfing transition that alters the ratio of population recombination rate and mutation rate. Authors validated these methods by using the simulated data, confirming that both methods can generally infer the timing of the outcrossing-to-selfing transition jointly with population size changes, although its precision depends on several population history settings.

    This study would be an important contribution to the field of mating system evolution. By applying the proposed methods to many other selfing organisms, we may be able to see a general picture of the timescale of the outcrossing-to-selfing transition combined with population size dynamics. At the same time, this is one of the extensions of the SMC method, which has already been well utilized for various inferences, including population size and recombination rate heterogeneity.

    We thank the reviewer for his positive comments and acknowledging the novelty and relevance of our study for the field.

    I do not find a major weakness in the methodologies of this study, but I have a few comments on their applications to the data of Arabidopsis thaliana. It is important that these estimates largely depend on what input data is used, especially the mutation rate and recombination rate. While the authors claim that their estimate is older than Bechsgaard's estimate (<413 kyrs), these two studies used different mutation rates: the authors used Ossowski's mutation rate, and Bechsgaard used Koch's mutation rate (Koch et al. MBE 2010). To compare these two estimates, it is important to use the same mutation rate. Shimizu & Tsuchimatsu (2015; Ann Rev Eco Evo Syst) in detail discussed this point and showed that Bechsgaard's estimate becomes <1.48 myrs when Ossowski's mutation rate was used (see Figure 4). Then it happens to overlap with the estimate of this study.

    Thank you very much for identifying this important problem. It is indeed critical to re-scale Bechsgaard’s age of the transition using the same mutation rate as used in our analysis (Ossowski et al 2010). We now use the rescaled estimate published in your review (Shimizu and Tsuchimatsu 2015, figure 4). We note that Bechsgaard et al did not publish a measure of uncertainty around their estimate of the transition; making it difficult to compare it with our posterior distributions. However, Bechsgaard’s estimate is not contained within the credibility intervals of our posteriors for t_sigma and therefore we consider both results significantly different. We have modified the text accordingly, at page 4 l. 8-10; and p.12 l. 27 to p.13 l. 5

    I am also concerned about the genomic regions of Arabidopsis thaliana used for this study. Authors chose specific five regions based on homogeneity of recombination rates and diversity, but how does the estimated change when randomly chosen genomic regions are used? If it is important to choose "preferable" regions according to the homogeneity of recombination rates and diversity, it may be useful to describe how these regions should be chosen for future applications of this method to other organisms.

    The genomic intervals used for the application to A. thaliana are indeed not random. They were defined such as to avoid, on each chromosome, the increased diversity observed at and surrounding pericentromeric regions. This effect has already been described by Clarck et al (2007, Science) but however, no explanation for this pattern has been published yet. We have updated the text, including a recommendation for future application to other species, at lines p. 13 l. 8-15 and p. 18 l.25-30, and Figure S15. We have also replicated our analysis of the A. thaliana data using a different set of genomic intervals located outside pericentromeric regions (Figure S15 and S16)

    Reviewer #2 (Public Review):

    This submission seeks to detect changes in the rate of selfing through pairwise comparison of haplotypes sampled from a population. It begins, as did a previous paper by a subset of the authors (Sellinger et al. 2020), with the well-known theoretical finding that partial selfing increases the rate of coalescence and decreases the rate of crossing-over events in genealogical histories.

    I am supportive of pitching this contribution as primarily theoretical, with the very short discussion of the Arabidopsis data provided as a worked example. This perspective increases my enthusiasm, compared to an initial reading. My comments are intended to encourage development.

    Some thematic characteristics reduce the impact of the submission. Among these are:

    (1) a rather less than a scholarly perspective on previous literature;

    (2) tendency to avoid theoretical development in favor of computation;

    (3) little interpretation of results of their only analysis of real data.

    We have now revised the manuscript along the lines suggested by reviewer 2. We provide more references when needed, have emphasized in the abstract and in the theoretical part of the manuscript that it is primarily a new theoretical/methodological development with an application to A. thaliana data, and have improved the interpretation of the A. thaliana data (see reply to reviewer 1).

  2. eLife assessment

    This manuscript details the valuable development of population genetics theory that can be used to infer past changes in the selfing rate in natural populations. The inference procedure is solid, although the comparison to previous estimates can be improved, and deeper insight could be gained from further theoretical exploration. The work will be of broad interest to the field of mating systems evolution.

  3. Reviewer #1 (Public Review):

    The shift from outcrossing to selfing is one of the most prevalent evolutionary events in flowering plants. The ecological and genetic backgrounds of these transitions have been of major interest for decades, and one of the key questions was the dating of this transition. Timing of pseudogenization of the self-incompatibility (SI) genes has been used as a proxy for this transition because loss-of-function mutations of SI genes are often responsible for the evolution of predominant selfing. However, SI genes are identified only in a limited number of taxa, and in some cases, the evolution of selfing is not necessarily associated with loss of SI. Therefore, an independent time estimate of the evolution of selfing by genome-wide polymorphism data has been considered important in this field.

    This study provides two statistical methods: SMC-based and ABC-based methods. Both methods intend to detect the genome-wide signatures of the outcrossing-to-selfing transition that alters the ratio of population recombination rate and mutation rate. Authors validated these methods by using the simulated data, confirming that both methods can generally infer the timing of the outcrossing-to-selfing transition jointly with population size changes, although its precision depends on several population history settings.

    This study would be an important contribution to the field of mating system evolution. By applying the proposed methods to many other selfing organisms, we may be able to see a general picture of the timescale of the outcrossing-to-selfing transition combined with population size dynamics. At the same time, this is one of the extensions of the SMC method, which has already been well utilized for various inferences, including population size and recombination rate heterogeneity.

    I do not find a major weakness in the methodologies of this study, but I have a few comments on their applications to the data of Arabidopsis thaliana. It is important that these estimates largely depend on what input data is used, especially the mutation rate and recombination rate. While the authors claim that their estimate is older than Bechsgaard's estimate (<413 kyrs), these two studies used different mutation rates: the authors used Ossowski's mutation rate, and Bechsgaard used Koch's mutation rate (Koch et al. MBE 2010). To compare these two estimates, it is important to use the same mutation rate. Shimizu & Tsuchimatsu (2015; Ann Rev Eco Evo Syst) in detail discussed this point and showed that Bechsgaard's estimate becomes <1.48 myrs when Ossowski's mutation rate was used (see Figure 4). Then it happens to overlap with the estimate of this study.

    I am also concerned about the genomic regions of Arabidopsis thaliana used for this study. Authors chose specific five regions based on homogeneity of recombination rates and diversity, but how does the estimated change when randomly chosen genomic regions are used? If it is important to choose "preferable" regions according to the homogeneity of recombination rates and diversity, it may be useful to describe how these regions should be chosen for future applications of this method to other organisms.

  4. Reviewer #2 (Public Review):

    This submission seeks to detect changes in the rate of selfing through pairwise comparison of haplotypes sampled from a population. It begins, as did a previous paper by a subset of the authors (Sellinger et al. 2020), with the well-known theoretical finding that partial selfing increases the rate of coalescence and decreases the rate of crossing-over events in genealogical histories.

    I am supportive of pitching this contribution as primarily theoretical, with the very short discussion of the Arabidopsis data provided as a worked example. This perspective increases my enthusiasm, compared to an initial reading. My comments are intended to encourage development.

    Some thematic characteristics reduce the impact of the submission. Among these are:
    (1) a rather less than a scholarly perspective on previous literature;
    (2) tendency to avoid theoretical development in favor of computation;
    (3) little interpretation of results of their only analysis of real data.