Polygenic adaptation after a sudden change in environment

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This paper is an impressive and deep look at a very important problem: understanding the genetic underpinnings of evolution acting on a quantitative trait. The authors analytically study the response to an abrupt shift in phenotypic optimum, in terms of both phenotype and genetic basis (how various alleles/loci contribute to this response). The basic assumptions are classic, but the methods and findings are new (especially finite population effects) and well supported by clear analytical approximations and extensive simulation checks. The main finding is that the relative contribution of large vs moderate effect alleles changes substantially and predictably over a long-term period after the shift, even though the phenotypic changes are already undetectable over this period.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Polygenic adaptation is thought to be ubiquitous, yet remains poorly understood. Here, we model this process analytically, in the plausible setting of a highly polygenic, quantitative trait that experiences a sudden shift in the fitness optimum. We show how the mean phenotype changes over time, depending on the effect sizes of loci that contribute to variance in the trait, and characterize the allele dynamics at these loci. Notably, we describe the two phases of the allele dynamics: The first is a rapid phase, in which directional selection introduces small frequency differences between alleles whose effects are aligned with or opposed to the shift, ultimately leading to small differences in their probability of fixation during a second, longer phase, governed by stabilizing selection. As we discuss, key results should hold in more general settings and have important implications for efforts to identify the genetic basis of adaptation in humans and other species.

Article activity feed

  1. Evaluation Summary:

    This paper is an impressive and deep look at a very important problem: understanding the genetic underpinnings of evolution acting on a quantitative trait. The authors analytically study the response to an abrupt shift in phenotypic optimum, in terms of both phenotype and genetic basis (how various alleles/loci contribute to this response). The basic assumptions are classic, but the methods and findings are new (especially finite population effects) and well supported by clear analytical approximations and extensive simulation checks. The main finding is that the relative contribution of large vs moderate effect alleles changes substantially and predictably over a long-term period after the shift, even though the phenotypic changes are already undetectable over this period.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their name with the authors.)

  2. Reviewer #1 (Public Review):

    The paper puts a lot of effort into many things that could make this work influential: the assumptions and parameter values under which the results hold are carefully examined, the approximations are difficult and carefully explained, the results are checked by simulations, and the underlying reasons for the results are explained in simple terms. In particular, the "linear" approximation is already enough for a good theoretical paper; the subsequent "nonlinear" approximation (which builds on the linear one) is very impressive. I am not certain precisely which results are new to this paper, but my impression is that it gives a much more complete picture of the details of polygenic adaptation than any previous work. The main limitation of the work is that it describes (and, simulates) a large number of unlinked loci, but this is entirely appropriate and well discussed in the paper.

    My first observation is that, despite the author's good attention to detail and effort to explain what's going on, I found this to be a difficult paper, that I had to put a lot of work into to understand. (I did feel that that the work paid off eventually, though.) However, this is not a serious criticism - the topic is complex, and the paper does a good job of explaining the big picture. The hardest thing for me to keep straight was the various layers of approximations (I think: linear Lande, nonlinear Lande, linear non-Lande, and nonlinear non-Lande, each within each of the two phases - plus three different types of simulation). If it were possible to remove discussion of some of these parallel tracks without removing important conceptual results, I think that would help. However, I have no concrete suggestions.

    Besides that, I have only one concern. The authors have but a lot of work into the simulations, but all plots show mean values, with no indication of between-simulation stochasticity. This makes sense, because the theory they develop describes mean quantities, but it would still be nice to know how well we expect the theory to predict dynamics of a single given bout of adaptation. For instance, Figure 2 shows the mean trajectories of trait mean, variance, and skew. What is the typical path of these for a single simulation trajectory? Or, Figure 5 shows how alleles of different sizes are expected to contribute to adaptation. How much do typical contributions to adaptation in a single simulation differ? Showing just one or two examples in the supplement could help make things more concrete.

    Other comments:

    - The github repository that's supposed to contain the code for the paper is empty. ( https://github.com/sellalab/PolygenicAdaptation1D )

    - Many of the plots (e.g., Figure 5) show "contributions" to adaptation plotted against S, on a log scale. But, isn't this a density with respect to effect size, and so shouldn't we read these plots as histograms, with relative area under the curves giving the relative contributions to adaptation? If so, the log scale could give a very wrong idea, and changing variables so the curve is a function of log(S) would avoid the problem.

    - It was hard for me to figure out a single set of simulation parameters to put into a forward simulator to match what was used in the paper, as the relevant information is scattered throughout (supplemental section 5.2 notwithstanding). To make things concrete, it would be nice to put a self-contained example in somewhere. I think that with a genome of length L, typical parameters were N=1e4, u=0.01/L, V_s=2e4, and an Exponential distribution of effect sizes with mean 4, equal probabilities + or -?

    - The agreement between simulations of allele frequencies and the "full" (still unlinked) model is impressive (see Figure C5.1)

  3. Reviewer #2 (Public Review):

    Context:
    The authors propose a new analysis of an already well-studied conceptual model of adaptation to a new environment. Individual genotypes are characterized by some (breeding value for) phenotype under gaussian stabilizing selection (meaning that fitness is a gaussian function of phenotype, centered around some optimum value). The scenario assumed is that an isolated population of fixed size is initially at equilibrium (between mutation, selection and genetic drift). This population is diploid and sexual with many unlinked loci acting additively on phenotype (across loci and between homologous chromosomes). This view simplifies the analysis but is also not inconsistent with various empirical analysis of locus specific effects on quantitative traits (the empirical support is discussed and reviewed in both introduction and discussion).

    Then a change in the environment induces a shift in the optimum without affecting any other parameter (strength of selection, population size, mutation effects, existing phenotypes), see figure 1. We wish to know how the population responds to this change, both in terms of phenotype distributions, and the underlying genetic basis (how alleles of various effects change in frequency and contribute to the phenotypic response).

    This process has been at the core of the modelling of adaptation for more than a century, as it is maybe the most natural conceptual framework to describe adaptation to a new environment (a "niche shift" so to speak). It is relevant to both the study of demographic/ecological and phenotypic responses to changing conditions, and to the genomics of the changes associated with this process.
    However, in spite of this long history (reviewed in introduction in broad lines), we do not have an exact mathematical description of this process. The reason is that the problem is in fact very complex: the genome is a sea of various genes, each bearing various alleles (depending on the individual), that further interact mutually by selection (even though loci are additive on phenotype), because fitness is not a linear function of phenotype. The simple population genetics with two alleles and one locus seem far away...

    I think it is fair to say that the main route to handle this problem, in predominantly sexual species, has been through the approximations of quantitative genetics. There, each locus is assumed of small effect and linkage disequilibrium between them is neglected. This has led to empirically testable, and often quite accurate, predictions on the response to selection in terms of mean phenotypic change. Yet, even under this broad approximation strategy, there are various ways to derive predictions, each neglecting one force or another (genetic drift most of the time), or looking at the process over short or longer timescales.

    Aim and achievements:
    The authors include their work within this broad framework, but set to derive new approximations that are intended to cover several of the existing approach as subcases, and especially to handle genetic drift effects in finite populations (large ones), and short vs. longer timescales. I believe they succeed quite well in doing so: they provide clear approximation methods (in appendix mostly) and substantial simulations to show their accuracy. The derivations are fairly technical but most of the time they manage to give an intuition of where they come from and illustrate this intuition via figures in the main text. They produce a prediction of two main observable dynamics: that of the (breeding value for) phenotype itself (its mean over time, variance, third moment), and that of the genetic contribution of various loci and alleles along the genome (depending on the allelic effect on phenotype). They also describe two timescales where the dynamics are fairly different, a short timescale where the mean phenotype is shifting (quite rapidly over tens/hundreds generations) towards the new optimum, and a longer timescale where the higher moments and mostly the genetic basis changes while the mean phenotype merely wanders in a narrow vicinity of the new optimum. The connection between the two timescales is important as it is the slight differences in allele fates during the first one that result in differences in long term behavior in the longer one (illustrated in figure 3).

    The main achievement on the phenotypic response is mostly to reobtain previous approximations under somewhat different or broader assumptions. This is not useless as it may explain why these known predictions (the "Lande model") are surprisingly robust to deviations from the required conditions (e.g. figure 2). However, I think that some extra exploration of the parameter space (away from the required conditions) would allow to really see when the Lande model does fail on mean phenotype dynamics over short timescales, as anticipated. The question of whether this range is relevant remaining open to empirical measurement.
    Therefore, the main contribution of this ms is not on phenotypic responses but on the underlying genetic basis, and what we may expect to observe when measuring QTL's or GWAS between two populations separated by an environmental shift in the past: are there many loci contributing limited difference, or fewer loci contributing most of it. In that respect, eqs 20-21 and 25-26-27, and figures 5 and 6 display the main findings and thei check by simulations. These findings, although stemming from quite elaborate derivations, yield a fairly simple and yet accurate outcome, at least in the parameter range studied. Various other parameter sets are also checked against simulations in the appendix, and the simulation code is made available for any further check (as exploring all the possible parameters is a fairly taunting task, for an article of its own probably).

    Limits:
    I believe the main limit of this work is fairly explained in the discussion: to achieve mathematical tractability (a full numerical treatment being inherently impossible given the many parameters), many simplifying assumptions must be made (simple fitness landscape, simple effect of the environmental change, simple demography etc.). This means that it is possible that empirical observations will differ from the predictions for various reasons. However, quantitative genetics have already proven reasonably robust and accurate in predicting observed phenotypic dynamics, using comparable approximations so it is not madness to hope that the same will happen concerning the genetic basis of adaptation. Also, I would suspect that the methods proposed in appendix will most likely extend fairly easily to some deviations from the model's assumption: change in phenotypic variance with the new environment (a form of plasticity), or in width of the fitness function, or change the population size, without too much effect on the main conclusions. Still, some other limits may not be overcome as easily (e.g. pleiotropy among multiple traits, or non-stationary optimum), but it seems (a priori) that part of the approach could still be adapted for these situations. The main "wall-hitting" limit of the paper is inherent in the very basis of the approach, namely assuming mild changes occurring in weakly linked polymorphic and numerous loci as opposed to strong changes occurring on more tightly linked and fewer loci. These limits are all fairly described in discussion.

    Overall, this paper is not an easy read, but not by lack of clarity, rather because the problem at hand is complex, and there is a lot of material to describe. Each part flows quite well in my opinion, but there are many parts to read.

    Potential impact:
    I believe that because it yields relatively simple analytic outcomes (at least the predictions in main text), the paper could be useful to data analysis, mostly in the field of genomics of adaptation where it may provide testable predictions for GWAS and QTL data. It could also be used to infer genetic distributions (v(a),f(a)) from observed QTL or GWAS data, if the model is deemed valid.

    In the field of theoretical population genetics, it may also provide a methodology to capture sexual adaptation dynamics in other contexts by mixing various approximation methods: connecting distinct timescales, connecting deterministic approximations for phenotype and diffusion approximations for allele frequencies. This may not be the first time of course (see e.g. "stochastic house of cards" and their extensions), but it is here used in the context of adaption dynamics rather than equilibria, for the first time I think.