Mutational robustness changes during long-term adaptation in laboratory budding yeast populations

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    Johnson and Desai previously reported "increasing cost epistasis", where mutations tended to have more deleterious effects in higher fitness backgrounds. Here they use the same system as before to investigate adapting populations by introducing a set of 91 mutations at multiple time points. As expected, the mean fitness effect of the mutations does decline in most (but not all) populations as they adapt but the effect is weaker than in the previous work, and in another condition, mean fitness effects of mutations do not change as the populations adapt. They suggest an intriguing interpretation (among others) that the "control coefficient" of selection on growth shifts between different genetic modules over time.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

As an adapting population traverses the fitness landscape, its local neighborhood (i.e., the collection of fitness effects of single-step mutations) can change shape because of interactions with mutations acquired during evolution. These changes to the distribution of fitness effects can affect both the rate of adaptation and the accumulation of deleterious mutations. However, while numerous models of fitness landscapes have been proposed in the literature, empirical data on how this distribution changes during evolution remains limited. In this study, we directly measure how the fitness landscape neighborhood changes during laboratory adaptation. Using a barcode-based mutagenesis system, we measure the fitness effects of 91 specific gene disruption mutations in genetic backgrounds spanning 8000–10,000 generations of evolution in two constant environments. We find that the mean of the distribution of fitness effects decreases in one environment, indicating a reduction in mutational robustness, but does not change in the other. We show that these distribution-level patterns result from differences in the relative frequency of certain patterns of epistasis at the level of individual mutations, including fitness-correlated and idiosyncratic epistasis.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    The 2019, Johnson et al., Science study (referred to as "2019 study" or "prior study" in the rest of the comments) measured mutational robustness in F1 segregants derived from a yeast cross between a laboratory and a wine strain, which differ at >35,000 loci. To realize this, the authors developed a pipeline 1) to create the same set of transposon insertion mutations in each yeast strain via transformation; and 2) to measure the fitness effects of these specific insertion mutations.

    In this manuscript, the authors applied the same pipeline to laboratory evolved yeast strains that differ in only tens or hundreds of loci and thus are much less divergent than those used in the prior study. Both studies aim to characterize how the fitness of the sets of insertion mutations (mostly deleterious) vary depending on the existing mutations (mostly beneficial) in those yeast strains. However, the current manuscript, especially when compared to the prior study, suffers from several major weaknesses.

    First, only 91 genes out of >6,000 genes in the yeast genome are perturbed in the manuscript. The small set of disruption mutations is unlikely to faithfully capture the pattern of epistasis in the selected clones. By comparison, >1,000 insertion mutations were evaluated in the 2019 study. Because the majority of the >1,000 tested mutations were neutral, the authors focused on 91 insertions that had significant fitness effects. The same 91 insertion mutations are used in the current study. However, as evident in both studies, epistasis plays an important role in how insertion mutations interact with different genetic backgrounds. Considering the vastly different genetic backgrounds between clones used in the prior and current studies, the insertion mutations of interest in the current study is unlikely to be the same as those in the prior study. The large-scale genetic insertion used in the prior study is suggested to be conducted in the current study.

    This concern is summarized in Essential Revision 1 above; see our comments there for our detailed response. Briefly, we have added an additional Figure Supplement (Fig. 1 – Supplement 8; see above) demonstrating that the 91 insertion mutants have a similar range of effects in this study as in the previous one (which may be expected since the genetic backgrounds here are as closely related to those in the 2019 study as the backgrounds in the 2019 study are to each other).

    Second, the statistical power in the current manuscript is insufficient to support the conclusions. Fitness errors were not considered when several main conclusions were drawn (fitness errors on the y-axis of Figure 1B are not available; fitness errors on the x-axis of Figure 2 are not available). The current conclusions are invalid without knowing the magnitude of fitness error. Fitness of each clone should be measured in at least two replicates in order to infer errors of fitness measurements. Additionally, the authors isolated two clones from the same timepoint of each population and treated them as biological replicates based on the fitness correlation between the two clones. However, this practice can be problematic because the extent of fitness correlation varies across populations and it is less likely to capture the patterns of epistasis when clones are isolated from more heterogeneous populations. Similarly, the authors could avoid this bias by measuring the fitness of each clone in multiple replicates and treat the two clones from the same timepoint/population separately.

    We agree that details about statistical methods, most of which are taken from Johnson et al. (2019), were not clear in our text. As we also describe in our response to the Essential Revisions above, we have rewritten a large part of the methods text to provide more details about statistical methods and have calculated and reported errors more broadly:

    Errors on fitness effects: We have expanded our methods text describing how the fitness effect of a mutation is determined for a single clone / condition. This text now emphasizes the internal replication provided by redundant barcodes, which allows us to calculate a standard error for the effect of a mutation in a single clone / condition. These errors are shown in Figure 1 – Figure Supplements 1-3. We have also added details on how errors are calculated for a mutation for a population-timepoint, and these errors are now included in Figure 2.

    Errors on the DFE mean: We discuss this below.

    Considering clones separately: As we also describe in the essential revisions above, Johnson et al. (2021) shows that the mutational dynamics in these evolving populations are dominated by successive selective sweeps, so we expect clones isolated from the same population-timepoint to rarely differ by many mutations. However, we agree that there are likely some cases in which the two clones have important genetic differences. To address this concern, we have reanalyzed our data as you suggest, considering each clone separately. The results of this analysis are included for every main text figure in the form of figure supplements (Figure 1 - figure supplement 7, Figure 2 - figure supplement 5, Figure 3 - Figure supplement 5, and Figure 4 - figure supplement 1), which show that our qualitative conclusions are unchanged.

    Reviewer #2 (Public Review):

    Johnson and Desai have developed a yeast experimental-evolution system where they can insert barcoded disruptive mutations into the genome and measure their individual effect on fitness. In 2019 they published a study in Science where they did this in a set of yeast variants derived by crossing two highly diverged yeast. They found a pattern that they termed "increasing cost epistasis": insertion mutations tended to have more deleterious effects on higher fitness backgrounds. The term "increasing cost epistasis" was coined to play off the converse pattern commonly observed in experimental evolution of "diminishing returns epistasis" wherein beneficial mutations tend to have smaller effects on more fit backgrounds. Another way to think about fitness effects is in terms of robustness: when mutations tend to have little effect on phenotype, the system is said to be robust. Thus, when increasing costs epistasis is observed, it suggests that higher fitness backgrounds are less robust.

    In this paper, Johnson and Desai use this same barcoded-insertions system in yeast, but here the backgrounds receiving insertions are adapting populations. More specifically, they took 6 replicate populations that evolved for 8-10k generations and inserted a panel of 91 mutations at 6 timepoints along the evolutions. They then did this entire experiment in two different environments: one in rich media at permissive temperature (YPD 30) and one in a defined media at high temperature (SC 37). Importantly, the mutations accumulating in a population over time here are driven by selection-and thus the patterns of epistasis observed here are probably more relevant to "real" evolution than the backgrounds from the 2019 paper. The overarching question, then, is whether similar patterns of epistasis is found in these long-term adaptations and across conditions as was previously observed.

    The first major finding in this work is that at YPD 30 (where the yeast are presumably "happy"), the mean fitness effect does decline in most (but not all) populations as they adapt. Since the population is becoming more fit over time (relative to a constant reference type), this is consistent with the previously observed pattern of increasing cost epistasis. The strength of the effect is, however, weaker than in that previous work. The authors speculate that this may reflect the fact that far few mutations are involved here than in the previous study-giving far fewer opportunities for (mostly negative) epistatic effects. I find this explanation likely correct, although speculative.

    The second major, and far more surprising, result is that in the other condition (SC 37), the insertion mutations mutations do not show a consistent trend: mean fitness effect of the insertion mutations does not change as adaptation proceeds. This is despite the fact that fitness increases in these population over time just as it did in the YPD populations. Toward the end of the paper, the authors speculate as to why this is the case. They argue that in the YPD 30 environment, selection is mainly on pure growth rate. They suggest that the growth rate depends on different components such as DNA synthesis, production of translation machinery, and cell wall synthesis. Critically, these components are non-redundant and can't "fill in" for each other. So, for example, rapid DNA synthesis is of little value if cell-wall synthesis is slow. As adaptation fixes mutations that increase the function of one of these growth components, they shift the "control coefficient" to other components. This, they argue, may be the major explanation behind increasing cost epistasis. I find the logic of their argument compelling and potentially providing great insight into developing a richer view of epistasis. Future experiments will be needed to test how well the hypothesis holds up. They then flip the argument around and suggest that in the SC 37 environment, the targets of selection are fundamentally different from those in growth-centric YPD 30 conditions. Instead, they argue, there is likely more redundancy in the components that mutations are affecting. I again find their arguments compelling.

    After establishing these observed patterns for mean effects, they examine individual mutations and look at the relationship of fitness effects as a function of background fitness. The upshot of this analysis is that there are more negative correlations than positive ones (especially in the YPD 30 conditions), but also that there is a lot of variation: there are many mutations that show no correlation and a small number with a positive correlation. This casts substantial doubt on the simplistic view that for the vast majority of mutations, fitness itself causes mutations to have greater costs.

    We thank the reviewer for these positive comments and the nice summary of our work.

    As a minor point of criticism, a lot of statistical test are being done here and there is no attempt to address the issue of multiple testing. I would like the authors to address this. I say minor because I don't think the overarching patterns are being affected by a few false positive tests.

    Related points were also raised by the other reviewer. To address this, we have added multiple-hypothesis-corrected p-values for these least-squares Wald Tests (using the Benjamini-Hochberg method) to our dataset (Supplementary File 1). As you suggest, for this particular analysis in which we compare the overall number of mutations following each pattern, we are willing to accept the possibility of false positives, so we still use the original p-values to categorize the mutations in Figure 2. We address this point in the main text and provide the numbers of mutations falling in each category after we perform this correction:

    “Because we are primarily focused on comparing the frequency of each pattern across environments, we report these values before multiple-hypothesis-testing correction here and in Figure 2; after a Benjamini Hochberg multiple-hypothesis correction these values fall to 24/77 (~31%), 15/74 (~20%), 9/77 (~12%), and 11/74 (~15%), respectively.”

    From here the authors turn to using a formal modeling to understand epistasis better. For each mutation, they fit the fitness data to three models: fitness-mediated model = fitness effects are explained by background fitness, idiosyncratic model = fitness effects can change at any point in an evolution when a new mutation fixes, and full model = fitness effects depend on both fitness and idiosyncratic effects.

    My major criticism of the work lies here: the authors don't explain how the models work carefully and thoroughly, leaving the reader to question downstream conclusions. Typically, when models are nested (as the fitness-mediated and idiosyncratic models appear to be nested within the full model), the full model will, by definition, fit the data better than the nested models. But that is not the case here: for many mutations the idiosyncratic model explains more of the variance than the full model (e.g. Figure 3A). (Note, the fitness-mediated model never fits better than the full model). Further, when dealing with nested models in general, one should ask whether the more complex model fits the data enough better to justify accepting it over simpler model(s). There are clearly details and constraints in the models used here (and likely in the fitting process) that matter, but these are not discussed in any detail. Another frustrating part of the model fitting is that each model is fit to each mutation individually, but there is no attempt to justify this approach over one where each model is expected to explain all mutations. I'm not saying I think the authors have chosen a poor strategy in what they have done, but they have made a set of decisions about how to model the problem that carries consequences for interpretation, and they don't justify or discuss those decisions. I think this needs to be added to the paper. I think this should include both a high level, philosophy-of-our-approach section and, probably elsewhere, the details.

    The reason this matters is because the authors move on to use the fitted models and the estimated coefficients from the models in discussing and interpreting the structure of epistasis. For example, they say "We find that the fitness model often explains a large amount of variance, in agreement with our earlier analysis, but the idiosyncratic model and the full model usually offer more explanatory power." Looking at Figure 3A, this certainly appears to be the case, yet that type of statement is squarely in the domain of model comparison/selection-but as explained above, this issue is not addressed. Relatedly, the authors go on to argue that "Positive and negative coefficients in the idiosyncratic model represent positive and negative epistasis between mutations that fix during evolution and our insertion mutations." I'm left wondering whether the details of the model fitting process matter. I am left asking how the idiosyncratic model would perform on data that arose, for example, under the fitness-mediated model? Or how it would perform on data originating under the full model? Is it true that when data arises under a different model (say the full model) but is fit under the idiosyncratic model, negative coefficients always represent negative epistasis and positive coefficients will always represent positive epistasis and that model misspecification does not introduce any bias? Another thing I am left wondering about concerns the number of observed coefficients in the idiosyncratic model: if one mutation shows similar effects across backgrounds, it might generate one coefficient during model fitting, while another mutation that has different effects on different backgrounds could give rise to several coefficients-is there some type of weighting that addresses the fact that individual mutations can generate different numbers of coefficients? One can imagine bias arising here if this isn't treated carefully.

    One of the main conclusion that the authors reach is that the pattern of increasing cost epistasis (observed previously and here in the YPD 30 environment) may not arise from the effect of background fitness itself, but instead arise because epistatic effects tend to be negative-and the more interactions there are (with mutations accumulating over time), the more they tend to have a negative cumulative effect. I find it very likely that the authors have this major conclusion correct. By contrast, they find that at SC 37, the distribution of fitness effects is less negatively skewed-with a considerable number of coefficients estimated to be > 0. They close with a really interesting discussion exploring how these patterns likely arise from underlying biology of the cell, metabolic flux, redundancy, and selection for loss-of-function vs gain-of-function. I find a lot of this interesting and insightful. But because some of their conclusions rest squarely on the modeling, I encourage the authors to be more thorough and convincing in how they execute this aspect of the work.

    Thanks for these detailed comments about the modeling approach and analysis, which raise points that were also described in the Essential Revisions and by Reviewer 1. We agree that these details were not presented sufficiently clearly in the original manuscript. In the revised manuscript, we have added a much more in-depth section on the details of the modeling procedures in the Materials and Methods, including formulas for each model and a discussion of how noise could affect our modeling results (see responses to essential revisions and reviewer 1 above for more information). This includes an analysis of shuffled and simulated datasets, which will give readers a better sense of how to interpret these modeling results. We have also included a new paragraph in the results that compares the models for each mutation and for the entire dataset using the Bayesian Information Criteria (BIC):

    “We can also ask which model best explains the data using the BIC, which penalizes models based on the number of parameters. The small squares below the bars in Figure 3A indicate which model has the lowest BIC for each mutation. In YPD 30°C, the full model has the lowest BIC for 40/77 (~52%) mutations and the idiosyncratic model has the lowest BIC for 37/77 (~48%). In SC 37°C, the full model has the lowest BIC for 49/73 (~67%) mutations and the idiosyncratic model has the lowest BIC for 24/73 (~33%). When we assess how well each model fits the entire dataset in each environment, the full model has a lower BIC than the idiosyncratic model in both environments.”

    We also appreciate the suggestion to look at how coefficients are spread among mutations. We have made a new supplemental figure (Figure 3 - Figure supplement 3) that clearly shows the coefficients broken down by mutation for each condition. This figure shows that coefficients are often clustered for one mutation. That is, multiple populations often have similar coefficients / patterns of epistasis for a particular mutation. We don’t view this as a source of bias in our data, but as an indication that the mutations fixing in these populations sometimes exhibit similar patterns of epistasis with these insertion mutations. We now reference this supplemental figure in the main text (“see Figure 3 – figure supplement 3 for a breakdown of coefficients by individual mutations”) as a better representation of the coefficients that result from our modeling.

  2. Evaluation Summary:

    Johnson and Desai previously reported "increasing cost epistasis", where mutations tended to have more deleterious effects in higher fitness backgrounds. Here they use the same system as before to investigate adapting populations by introducing a set of 91 mutations at multiple time points. As expected, the mean fitness effect of the mutations does decline in most (but not all) populations as they adapt but the effect is weaker than in the previous work, and in another condition, mean fitness effects of mutations do not change as the populations adapt. They suggest an intriguing interpretation (among others) that the "control coefficient" of selection on growth shifts between different genetic modules over time.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

  3. Reviewer #2 (Public Review):

    Johnson and Desai have developed a yeast experimental-evolution system where they can insert barcoded disruptive mutations into the genome and measure their individual effect on fitness. In 2019 they published a study in Science where they did this in a set of yeast variants derived by crossing two highly diverged yeast. They found a pattern that they termed "increasing cost epistasis": insertion mutations tended to have more deleterious effects on higher fitness backgrounds. The term "increasing cost epistasis" was coined to play off the converse pattern commonly observed in experimental evolution of "diminishing returns epistasis" wherein beneficial mutations tend to have smaller effects on more fit backgrounds. Another way to think about fitness effects is in terms of robustness: when mutations tend to have little effect on phenotype, the system is said to be robust. Thus, when increasing costs epistasis is observed, it suggests that higher fitness backgrounds are less robust.

    In this paper, Johnson and Desai use this same barcoded-insertions system in yeast, but here the backgrounds receiving insertions are adapting populations. More specifically, they took 6 replicate populations that evolved for 8-10k generations and inserted a panel of 91 mutations at 6 timepoints along the evolutions. They then did this entire experiment in two different environments: one in rich media at permissive temperature (YPD 30) and one in a defined media at high temperature (SC 37). Importantly, the mutations accumulating in a population over time here are driven by selection-and thus the patterns of epistasis observed here are probably more relevant to "real" evolution than the backgrounds from the 2019 paper. The overarching question, then, is whether similar patterns of epistasis is found in these long-term adaptations and across conditions as was previously observed.

    The first major finding in this work is that at YPD 30 (where the yeast are presumably "happy"), the mean fitness effect does decline in most (but not all) populations as they adapt. Since the population is becoming more fit over time (relative to a constant reference type), this is consistent with the previously observed pattern of increasing cost epistasis. The strength of the effect is, however, weaker than in that previous work. The authors speculate that this may reflect the fact that far few mutations are involved here than in the previous study-giving far fewer opportunities for (mostly negative) epistatic effects. I find this explanation likely correct, although speculative.

    The second major, and far more surprising, result is that in the other condition (SC 37), the insertion mutations mutations do not show a consistent trend: mean fitness effect of the insertion mutations does not change as adaptation proceeds. This is despite the fact that fitness increases in these population over time just as it did in the YPD populations. Toward the end of the paper, the authors speculate as to why this is the case. They argue that in the YPD 30 environment, selection is mainly on pure growth rate. They suggest that the growth rate depends on different components such as DNA synthesis, production of translation machinery, and cell wall synthesis. Critically, these components are non-redundant and can't "fill in" for each other. So, for example, rapid DNA synthesis is of little value if cell-wall synthesis is slow. As adaptation fixes mutations that increase the function of one of these growth components, they shift the "control coefficient" to other components. This, they argue, may be the major explanation behind increasing cost epistasis. I find the logic of their argument compelling and potentially providing great insight into developing a richer view of epistasis. Future experiments will be needed to test how well the hypothesis holds up. They then flip the argument around and suggest that in the SC 37 environment, the targets of selection are fundamentally different from those in growth-centric YPD 30 conditions. Instead, they argue, there is likely more redundancy in the components that mutations are affecting. I again find their arguments compelling.

    After establishing these observed patterns for mean effects, they examine individual mutations and look at the relationship of fitness effects as a function of background fitness. The upshot of this analysis is that there are more negative correlations than positive ones (especially in the YPD 30 conditions), but also that there is a lot of variation: there are many mutations that show no correlation and a small number with a positive correlation. This casts substantial doubt on the simplistic view that for the vast majority of mutations, fitness itself causes mutations to have greater costs. As a minor point of criticism, a lot of statistical test are being done here and there is no attempt to address the issue of multiple testing. I would like the authors to address this. I say minor because I don't think the overarching patterns are being affected by a few false positive tests.

    From here the authors turn to using a formal modeling to understand epistasis better. For each mutation, they fit the fitness data to three models: fitness-mediated model = fitness effects are explained by background fitness, idiosyncratic model = fitness effects can change at any point in an evolution when a new mutation fixes, and full model = fitness effects depend on both fitness and idiosyncratic effects.

    My major criticism of the work lies here: the authors don't explain how the models work carefully and thoroughly, leaving the reader to question downstream conclusions. Typically, when models are nested (as the fitness-mediated and idiosyncratic models appear to be nested within the full model), the full model will, by definition, fit the data better than the nested models. But that is not the case here: for many mutations the idiosyncratic model explains more of the variance than the full model (e.g. Figure 3A). (Note, the fitness-mediated model never fits better than the full model). Further, when dealing with nested models in general, one should ask whether the more complex model fits the data enough better to justify accepting it over simpler model(s). There are clearly details and constraints in the models used here (and likely in the fitting process) that matter, but these are not discussed in any detail. Another frustrating part of the model fitting is that each model is fit to each mutation individually, but there is no attempt to justify this approach over one where each model is expected to explain all mutations. I'm not saying I think the authors have chosen a poor strategy in what they have done, but they have made a set of decisions about how to model the problem that carries consequences for interpretation, and they don't justify or discuss those decisions. I think this needs to be added to the paper. I think this should include both a high level, philosophy-of-our-approach section and, probably elsewhere, the details.

    The reason this matters is because the authors move on to use the fitted models and the estimated coefficients from the models in discussing and interpreting the structure of epistasis. For example, they say "We find that the fitness model often explains a large amount of variance, in agreement with our earlier analysis, but the idiosyncratic model and the full model usually offer more explanatory power." Looking at Figure 3A, this certainly appears to be the case, yet that type of statement is squarely in the domain of model comparison/selection-but as explained above, this issue is not addressed. Relatedly, the authors go on to argue that "Positive and negative coefficients in the idiosyncratic model represent positive and negative epistasis between mutations that fix during evolution and our insertion mutations." I'm left wondering whether the details of the model fitting process matter. I am left asking how the idiosyncratic model would perform on data that arose, for example, under the fitness-mediated model? Or how it would perform on data originating under the full model? Is it true that when data arises under a different model (say the full model) but is fit under the idiosyncratic model, negative coefficients always represent negative epistasis and positive coefficients will always represent positive epistasis and that model misspecification does not introduce any bias? Another thing I am left wondering about concerns the number of observed coefficients in the idiosyncratic model: if one mutation shows similar effects across backgrounds, it might generate one coefficient during model fitting, while another mutation that has different effects on different backgrounds could give rise to several coefficients-is there some type of weighting that addresses the fact that individual mutations can generate different numbers of coefficients? One can imagine bias arising here if this isn't treated carefully.

    One of the main conclusion that the authors reach is that the pattern of increasing cost epistasis (observed previously and here in the YPD 30 environment) may not arise from the effect of background fitness itself, but instead arise because epistatic effects tend to be negative-and the more interactions there are (with mutations accumulating over time), the more they tend to have a negative cumulative effect. I find it very likely that the authors have this major conclusion correct. By contrast, they find that at SC 37, the distribution of fitness effects is less negatively skewed-with a considerable number of coefficients estimated to be > 0. They close with a really interesting discussion exploring how these patterns likely arise from underlying biology of the cell, metabolic flux, redundancy, and selection for loss-of-function vs gain-of-function. I find a lot of this interesting and insightful. But because some of their conclusions rest squarely on the modeling, I encourage the authors to be more thorough and convincing in how they execute this aspect of the work.

  4. Reviewer #1 (Public Review):

    The 2019, Johnson et al., Science study (referred to as "2019 study" or "prior study" in the rest of the comments) measured mutational robustness in F1 segregants derived from a yeast cross between a laboratory and a wine strain, which differ at >35,000 loci. To realize this, the authors developed a pipeline 1) to create the same set of transposon insertion mutations in each yeast strain via transformation; and 2) to measure the fitness effects of these specific insertion mutations.

    In this manuscript, the authors applied the same pipeline to laboratory evolved yeast strains that differ in only tens or hundreds of loci and thus are much less divergent than those used in the prior study. Both studies aim to characterize how the fitness of the sets of insertion mutations (mostly deleterious) vary depending on the existing mutations (mostly beneficial) in those yeast strains. However, the current manuscript, especially when compared to the prior study, suffers from several major weaknesses.

    First, only 91 genes out of >6,000 genes in the yeast genome are perturbed in the manuscript. The small set of disruption mutations is unlikely to faithfully capture the pattern of epistasis in the selected clones. By comparison, >1,000 insertion mutations were evaluated in the 2019 study. Because the majority of the >1,000 tested mutations were neutral, the authors focused on 91 insertions that had significant fitness effects. The same 91 insertion mutations are used in the current study. However, as evident in both studies, epistasis plays an important role in how insertion mutations interact with different genetic backgrounds. Considering the vastly different genetic backgrounds between clones used in the prior and current studies, the insertion mutations of interest in the current study is unlikely to be the same as those in the prior study. The large-scale genetic insertion used in the prior study is suggested to be conducted in the current study.

    Second, the statistical power in the current manuscript is insufficient to support the conclusions. Fitness errors were not considered when several main conclusions were drawn (fitness errors on the y-axis of Figure 1B are not available; fitness errors on the x-axis of Figure 2 are not available). The current conclusions are invalid without knowing the magnitude of fitness error. Fitness of each clone should be measured in at least two replicates in order to infer errors of fitness measurements. Additionally, the authors isolated two clones from the same timepoint of each population and treated them as biological replicates based on the fitness correlation between the two clones. However, this practice can be problematic because the extent of fitness correlation varies across populations and it is less likely to capture the patterns of epistasis when clones are isolated from more heterogeneous populations. Similarly, the authors could avoid this bias by measuring the fitness of each clone in multiple replicates and treat the two clones from the same timepoint/population separately.