Mutational sources of trans-regulatory variation affecting gene expression in Saccharomyces cerevisiae

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    One key question in evolutionary biology is how traits can be affected by spontaneous mutations. This relationship between traits and mutations influences the rate and direction in which traits evolve. Here, the authors map a set of mutations that affect the expression of a focal gene in yeast, and examine their individual effects and locations in the genome and in the regulatory network. The work is rigorous and the results are well presented. The findings will be of great interest for geneticists and evolutionary biologists interested in the evolution of gene expression and of complex traits. Additional analyses and discussions will strengthen the generalization of the conclusions.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Heritable variation in a gene’s expression arises from mutations impacting cis - and trans -acting components of its regulatory network. Here, we investigate how trans -regulatory mutations are distributed within the genome and within a gene regulatory network by identifying and characterizing 69 mutations with trans -regulatory effects on expression of the same focal gene in Saccharomyces cerevisiae . Relative to 1766 mutations without effects on expression of this focal gene, we found that these trans -regulatory mutations were enriched in coding sequences of transcription factors previously predicted to regulate expression of the focal gene. However, over 90% of the trans -regulatory mutations identified mapped to other types of genes involved in diverse biological processes including chromatin state, metabolism, and signal transduction. These data show how genetic changes in diverse types of genes can impact a gene’s expression in trans , revealing properties of trans -regulatory mutations that provide the raw material for trans -regulatory variation segregating within natural populations.

Article activity feed

  1. Author Response:

    Reviewer #1 (Public Review):

    [...] The authors do a great job of listing and evaluating possible explanations, one of which is simply that the strains carry multiple mutations of small effect. All but one of the successfully mapped variants consists of missense and nonsense mutations. I think it's important to note that this represents a particular range of the effect-size distribution of mutations affecting the YFP phenotype. We know from the authors' earlier work that there are lots of mutations that can affect gene expression in cis, and so the absence of trans-acting cis-regulatory variants here is parsimoniously interpreted as due to their small effects. In general, work in other systems (particularly human genetics) has shown that even molecular traits are often hugely polygenic, affected by thousands of variants of tiny but non-zero magnitude. With a forward screen of the sort performed here, it's difficult to know how much of the phenotypic variance is due to unmapped small-effect variants, but two lines of evidence suggest it may be a lot: first, the absence of mappable causal mutations in 36/82 mutants, and second, the differences between EMS mutant strains and their matched single-site mutants. The authors commendably report and discuss these issues but to my mind they neglect them in drawing inferences and generalizations from their findings.

    We thank the reviewer for these encouraging comments and also appreciate the reviewer pointing out these concerns.

    With respect to the overlap of the trans-regulatory mutations we mapped and previously identified eQTL, we agree the possibility of similar mapping biases in the two BSA-seq studies contributing to the overlap of trans-regulatory mutations and eQTL warrants further exploration. We interpret the reviewer’s comment to suggest that if some regions of the genome systematically showed lower sequencing read coverage (because of poor read mappability, PCR biases or any other reason), the power to detect trans-regulatory mutations and eQTL in these regions would be decreased compared to regions of the genome with higher coverage because the G-tests used to identify significant associations with expression in both studies are based on read counts. Consequently, variation in sequencing read coverage across the genome shared in this study and the prior study identifying eQTL, both of which used BSA-seq, could lead to the enrichment of transregulatory mutations in eQTL regions. Indeed, consistent patterns of read coverage across the S. cerevisiae genome have been observed in prior work.

    To determine whether trans-regulatory mutations were enriched in regions of the genome with higher sequence read coverage, we compared read coverage between regions of the genome identified as having trans-regulatory mutation or non-regulatory mutations. The identification of variants classified as non-regulatory is expected to be less dependent on the depth of sequencing read coverage because this designation does not require a statistically significant G-test. We found that the mutations identified as trans-regulatory showed 120x coverage whereas mutations identified as non-regulatory showed only 100x coverage, consistent with greater power to detect associations with expression in regions of the genome with higher sequencing read coverage. However, eliminating this difference in read coverage by excluding non-regulatory mutations with lower sequence read coverage did not eliminate the observed enrichment of trans-regulatory mutations in regions previously shown to contain eQTL. Non-regulatory mutations with higher and lower sequencing read coverage were also equally likely to be found within eQTL regions, suggesting that similar variation in sequence read coverage across the genome between the two studies is unlikely to explain the observed overlap of trans-regulatory mutations and eQTL. These analyses are now included in a new Figure 7-figure supplement 1.

    With respect to better incorporating biases in what we were able to map and considerations for extending findings from this work to other systems, we have tried to better address these issues in the revised discussion.

    Reviewer #2 (Public Review):

    Fabien Duveau et al. tried to characterize mutations in trans-regulation effects on expression of the TDH3 by using EMS mutants with TDH3 reporter in Saccharomyces cerevisiae. This work is an extension of works of Gruber et al. (2012) and Metzger et al. (2016) with specific mutation effect on TDH3 expression. They found that these trans-regulatory mutations that have effects on expression of TDH3 reporter were enriched in coding sequences of transcription factors. They found that the trans regulatory mutations with effect are associated with natural variants of trans within S. cerevisiae. In summary, the data is well described and supports their claims. The method of study could be used for study the mechanism how regulatory network works.

    [...] Although the paper does have strengths in principle, some weaknesses of the paper would cause the quality of data presented. [...]

    We thank the reviewer for taking the time to evaluate this work and have the following responses to the weaknesses noted:

    1. The reviewer is correct that we focused this paper on trans-regulatory mutations because cis-regulatory mutations affecting TDH3 expression were previously characterized. Furthermore, long distance enhancers with cis-acting effects on expression have not been described in S. cerevisiae and the term promoter is commonly used to encompass both the basal (core) promoter (including a TATA box for some genes) as well as other upstream activating sequences (UAS) and upstream repressing sequences (URS). In other words, the cis-acting sequences for S. cerevisiae genes are confined to a particular region much more than in multicellular eukarlyotes. In fact, our prior work with TDH3 (Metzger et al. 2015) showed that 97% of cis-acting variation affecting TDH3 expression could be explained by sequence variation in the 678 bp region we define as its promoter. Consequently, all mutations outside of this region were considered to have transregulatory effects on TDH3 expression. In the revised version, we extended the discussion to specifically compare the structure of regulatory sequences in S. cerevisiae to other eukaryotic model systems.

    2. In this study, a mutation is defined as trans-regulatory if it affects TDH3 expression and is not located in the TDH3 promoter, regardless of whether or not it also affects growth rate. In fact, mutations in RAP1 and GCR1 affect growth rate (Figure 5), but are clearly trans-regulators of TDH3 with well-established binding sites in the TDH3 promoter. In other words, we do not think that mutations should be discounted as having trans-regulatory effects because they also impact growth rate.

    3. (A) Prior work examining the statistics of BSA-seq has shown that G-tests are most appropriate because they take into account the independent sampling from two bulk populations inherent to bulk-segregant analysis (Magwene et al. 2011 PLOS Computational Biology). (B) We are guessing that the reviewer is asking about multiple testing corrections rather than post-hoc tests, as we used a false discovery rate correction for multiple tests in Figure 2-supplement 5A. Although we did not use a multiple test correction for the BSA-seq data, we used a conservative significance threshold of 0.001 that was expected to result in a 3.5% false positive rate. Perhaps more importantly, we functionally validated the effects of 40 of the 41 associated mutations tested. (C) We may indeed have been overly optimistic about mapping power when choosing mutants to analyze with BSA-seq given that the 36 EMS mutants for which we failed to find a significant association between a mutation and fluorescence tended to have smaller effects on PTDH3-YFP expression than the EMS mutants for which we observed one or more associated sites (Figure 3-figure supplement 3). The reviewer’s comment also made us realize that our original sentence referring to mapping power had reported the effect size for estimated RNA levels rather than fluorescence. To avoid confusion, and because our anticipated mapping power does not affect the results of the study, we deleted this statement from the revised manuscript. Regardless of our anticipated mapping power, we were ultimately able to map mutations that affected fluorescence by as little as 1.6% relative to the wildtype strain.

    4. The GO enrichment analysis was performed with widely used tools on www.pantherdb.org. The statistical significance of enrichment for each GO term was computed using Fisher’s exact tests that compared 1) the proportion of genes with non-regulatory mutations and 2) the proportion of genes with trans-regulatory mutations that corresponded to the tested GO term. Because the total number of genes identified in our study with trans-regulatory mutations (42 genes) was much lower than the total number of genes with non-regulatory mutations (1043 genes), it was possible to obtain strong and statistically significant enrichment (P < 0.05 in Fisher’s exact test) even if only a small number of genes corresponded to the GO term in both categories. Although we found a large number of enriched GO terms, these GO terms were not always independent from each other. For instance, GO:0009168 (purine ribonucleoside monophosphate biosynthetic process) and GO:0009167 (purine ribonucleoside monophosphate metabolic process) refers to the same biological process and contains the same genes. For this reason, even though we reported all enriched GO terms in Supplementary File 8, we only showed GO terms that were at the tips of different branches in the GO hierarchy on Figure 6 and we grouped GO terms in four main categories that together encompassed most genes with trans-regulatory mutations.

    5. We agree with the reviewer that trans-regulatory mutations can affect either the function of a gene product (including the ability of a transcription factor to bind to DNA) as well as the abundance of that gene product, but we do not think this is a weakness of the study. In fact, we think one of the strengths of the study is that we have empirical data testing the relative frequency of these two types of possible changes, finding that mutations in coding regions (presumably more likely to affect the function of the gene product than its expression) are the primary source of changes in TDH3 expression greater than 1%.

    6. The goal of the study was to characterize the effects of individual trans-regulatory mutations, thus we did not look at the combined effects of mutations in proteins that might work in a complex. We do, however, mention transcription factors working in a complex: "Transcription factors encoded by the TYE7 and GCR2 genes found to harbor trans-regulatory mutations affecting expression of PTDH3-YFP are known to regulate the expression of glycolytic genes (including TDH3) by forming a complex with transcription factors encoded by the RAP1 and GCR1 genes” (line 461). We think that looking at the combined effects of mutations that all impact the same complex of regulatory proteins is an interesting direction for future work.

    Finally, we’d like to point out that the reviewer’s statement in their opening summary about mutations being enriched in the coding sequence of transcription factors is not quite correct: the mutants we mapped were enriched in coding sequences, and we found more mutations in transcription factors previously shown to regulate (directly or indirectly) expression of TDH3 than expected by chance, but trans-regulatory mutations were not significantly enriched in genes encoding transcription factors relative to non-regulatory mutations (as described in the manuscript).

    Reviewer #3 (Public Review):

    [...] The mutagenesis approach in yeast the authors used is very powerful, but it naturally has drawbacks. The regulatory landscape in yeast is arguably simpler compared to e.g. metazoa or plants, in that the cis-regulatory regions are predominantly closely linked to target genes, the genes in majority do not have introns and post-transcriptional regulation of mRNA through e.g. splicing is rare. These features distinguish the systems, as in animals and plants introns are a very prominent source of regulatory elements (close to half of all enhancers are intronic in many animals), and alternative splicing of e.g. transcription factors are known to play major roles in transcriptional regulation. Further, chromatin is a very important layer in metazoan and plant gene regulation. To benefit the general readership, it would be informative to further elaborate on the significance of the findings for researchers studying other organisms. In addition, it would help to clarify what aspects of the differences in the regulatory landscape the authors think are important to distinguish.

    We thank the reviewer for their kind words and recognition of the novelty of this work. We have modified the introduction to try to clarify the relationship of this work to eQTL studies, which we hope addresses the reviewer’s first concern. Specifically, we’ve tried to clarify that the complex, polygenic nature of trans-regulatory variation segregating within a species is well established by prior eQTL studies. We also sought to clarify that our work (which maps single mutations from mutagenized strains rather than natural variation) provides complementary insight into the distribution of regulatory mutations within the genome and within a gene’s regulatory network. Revisions have also been made to try to clarify that the single mutations we mapped were from EMS-induced mutants containing only ~24 mutations per genome, which is more than 1000-fold less than the number of single nucleotide polymorphisms between two strains of S. cerevisiae. That is, this study was designed to identify single trans-regulatory mutations rather than to characterize the genetic architecture of naturally occurring trans-regulatory variation. Although we intentionally focused on characterizing properties of single mutations here, we agree with the reviewer that testing for epistatic interactions among trans-regulatory mutations will be an interesting avenue for future work, and have added this point to the revised discussion. We have also added text to the discussion describing some similarities and differences in gene regulation as among eukaryotes that should be considered when trying to generalize from this work.

  2. Evaluation Summary:

    One key question in evolutionary biology is how traits can be affected by spontaneous mutations. This relationship between traits and mutations influences the rate and direction in which traits evolve. Here, the authors map a set of mutations that affect the expression of a focal gene in yeast, and examine their individual effects and locations in the genome and in the regulatory network. The work is rigorous and the results are well presented. The findings will be of great interest for geneticists and evolutionary biologists interested in the evolution of gene expression and of complex traits. Additional analyses and discussions will strengthen the generalization of the conclusions.

  3. Reviewer #1 (Public Review):

    This paper aims to characterize mutations that act in trans to affect a single gene's expression in yeast. Trans-acting mutations potentially play an important role in variation and disease within species and in phenotypic evolution. The authors have previously described the mutational architecture and natural variation in the gene's cis-regulatory activity, creating a powerful experimental model for the causes of phenotypic variation. Trans-acting variation is much more challenging, because the mutational target space is the whole genome. The authors use a forward-genetic screen and bulked segregant analysis to identify 52 point mutations that affect their focal transgene's activity, and they identify an additional 17 by directly searching within a handful of candidate genes. The paper includes elegant validation using genome engineering to confirm the mapped variants are causal.

    With this collection of trans-acting mutations in hand, the authors can compare their characteristics to a set of mutations that do not have detectable effects on the transgene. Overall, they conclude that trans-acting mutations are enriched for genes that are known to sit upstream of the focal gene in transcriptional cascades, but the majority of mutations are in other kinds of genes, outside the network of transcription factors.

    This work is a valuable contribution to the authors' important experimental assault on the genetics of regulatory variation, a useful complement to their previous work on cis-regulatory mutations and polymorphisms. They provide evidence that experimentally defined regulatory networks have predictive value for the location of trans-acting mutations, and they reinforce the result (well established and widely accepted, but important to show in this kind of rigorous way) that trans-acting variation is distributed across a wide range of cellular and molecular functions. There are also some useful fine-grained results, such as the absence of mutations in a known regulator, RAP1, probably due to pleiotropic constraints, and an excess of mutations in iron homeostasis.

    Because the dataset of trans-acting mutations is relatively modest in size (necessarily- it's a heroic effort to identify this many), many of the enrichments are also modest. In particular, the finding that mutations are enriched in eQTL regions holds for only two of three previous eQTL studies, and involves a slight elevation over the baseline that 66% of the genome is in eQTL regions. Because both the eQTL and the mutations were discovered by bulked-segregant analysis, biases in mappability will affect both similarly, and so I do not find the enrichment for overlapping hits to be completely persuasive.

    This work is important in substantial measure because of its contribution to the larger yeast TDH3 model trait project, which is a landmark research program for understanding phenotypic variation and evolution. On its own, the results in this manuscript would be difficult to generalize to regulatory variation more broadly. There are narrow reasons for this (yeast has a distinctive compact CDS-dense genome; the focal transcript is YFP and so has no endogenous post-transcriptional regulation; only one class of mutations assayed), but the bigger reason is that the researchers are only able to discover mutations with effects above a particular size. Even among the 82 mutant strains they start with, some 36 strains have altered YFP levels but no successfully mapped causal variants. The authors do a great job of listing and evaluating possible explanations, one of which is simply that the strains carry multiple mutations of small effect. All but one of the successfully mapped variants consists of missense and nonsense mutations. I think it's important to note that this represents a particular range of the effect-size distribution of mutations affecting the YFP phenotype. We know from the authors' earlier work that there are lots of mutations that can affect gene expression in cis, and so the absence of trans-acting cis-regulatory variants here is parsimoniously interpreted as due to their small effects. In general, work in other systems (particularly human genetics) has shown that even molecular traits are often hugely polygenic, affected by thousands of variants of tiny but non-zero magnitude. With a forward screen of the sort performed here, it's difficult to know how much of the phenotypic variance is due to unmapped small-effect variants, but two lines of evidence suggest it may be a lot: first, the absence of mappable causal mutations in 36/82 mutants, and second, the differences between EMS mutant strains and their matched single-site mutants. The authors commendably report and discuss these issues but to my mind they neglect them in drawing inferences and generalizations from their findings.

  4. Reviewer #2 (Public Review):

    Fabien Duveau et al. tried to characterize mutations in trans-regulation effects on expression of the TDH3 by using EMS mutants with TDH3 reporter in Saccharomyces cerevisiae. This work is an extension of works of Gruber et al. (2012) and Metzger et al. (2016) with specific mutation effect on TDH3 expression. They found that these trans-regulatory mutations that have effects on expression of TDH3 reporter were enriched in coding sequences of transcription factors. They found that the trans regulatory mutations with effect are associated with natural variants of trans within S. cerevisiae. In summary, the data is well described and supports their claims. The method of study could be used for study the mechanism how regulatory network works.

    Strengths:

    This work provides a new general trans regulatory network on a specific focal gene. This work confirms that trans-regulatory mutations with effects on targets are often located in coding sequences, and they are correlated with natural variation within yeast. This would help insight on evolutionary of trans, and function analysis of TF as well.

    This paper provides a model that predicts the effects of trans regulatory changes on the expression of one specific focal gene. The technique in principle works in the same way across many different types of gene regulation network.
    BSA-Seq and permutation were applied for single site mutation effect on the focal gene out of mutated strains. The method is well designed for identifying trans regulatory mutation effects on expression of focal gene.

    I appreciate that the authors provide the raw data and data processing in the supplements. It helps someone really interested with this work dig up the data with detail.

    Weaknesses:

    Although the paper does have strengths in principle, some weaknesses of the paper would cause the quality of data presented. In particular:

    1. The authors did a lot of analyses on trans regulatory mutations on TDH3, however, no cis-regulatory mutation was discussed. In the previous work of Metzger et al. (2016), there are 235 cis mutations, and their effects on expression of TDH3 are even stronger than trans mutations. Secondly, the authors only consider the promoter of TDH3 as cis regulation. However, cis regulation should include enhancers and repressors as well.

    2. Since some mutations would change the development or growth rate of yeast, thus change the expression of focal gene, this kind of change is hard to say trans regulatory effect.

    3. For the statistics of this paper, there is some weakness. A) The method did not explain the details why choose G-tests for statistics. Why this is better than other tests, especially chi-squared test? B) If there are more than two categories tested the significance from their null expectation, Post-hoc test should be performed. However, I only see Figure 2-suppl5. A has such kind of test. C) Is the threshold of expression difference of 1% too sensitive? I read the cited paper of "altering 162 fluorescence by 1% or more (Duveau et al., 2014)" in page 4 line 162, and can not find what is the threshold based on.

    4. The result of GO analysis seems robust. The significant genes for GO are a bit two less and it would cause the bias of GO analysis. Most presented enriched GO terms only have 2-3 significant genes and it means only few genes have big effects on enrichment. The suppl. Table 8 shows more than 150 enriched GO terms of BP enriched.

    5. There are two dimensions of activity change in trans regulation because of mutations, one is the ability of binding (structure change) and another is the abundance of transcriptional factors.

    6. As many transcriptional factors works as a complex, this study considers each trans regulator independent.

  5. Reviewer #3 (Public Review):

    Duveau et al. present a dissection of the genetic architecture and mutational spectrum of new unlinked (trans acting) variation influencing the expression of a single gene. By using a smart approach that combines a promoter of interest, a reporter gene, mutagenesis and identification of regulatory variation influencing target gene expression by fluorescence sorting, the authors trace the source of regulatory variation in trans factors. The results bring significant novel insight into the architecture of regulatory networks. Regulatory networks have been dissected by using a variety of approaches, including mapping of expression variation (eQTL), targeted functional test and the approach used here, mutagenesis. The mutagenesis approach is useful to discover novel variation not yet filtered by natural selection or drift, meaning that it is useful to trace regulatory interactions in a "neutral" setting. Previous research has been very successful to study regulatory networks but has one clear weak point; the architecture and source of trans acting mutations have been hard to study, either because of technical/statistical constraints (e.g. eQTL's), or because of low scalability (targeted approaches). The present work fills a significant gap in our understanding of the nature of trans effects by identifying and functionally testing a large number of trans acting mutations. The results broaden our understanding of the mutational sources of trans variation.

    Overall, the manuscript is very extensive, the results are clearly explained and the conclusions are well supported by the data. I have identified a few general aspects in which the manuscript could be improved, explained below.

    The authors posit that "we know comparatively little about the genomic sources, molecular mechanisms of action and evolutionary contributions of individual trans-regulatory mutations." I generally agree with this point. However, the genetic architecture of expression variation has been extensively studied using genetic mapping (in contrast to mutagenesis used here), with a few significant insights into the architecture of trans-acting variation. I think elaborating on the strengths/weaknesses and commonalities/differences between mutagenesis versus eQTL strategies would benefit the reader, as well as recognising what ground has already been covered by eQTL approaches, and what is the novelty of the authors work.

    The architecture of the trans-acting mutations was in majority simple; most trans-effects associated with single SNPs with ~37% (i.e. 17/46, if I have interpreted the results correctly) of trans-effects being due to multiple mutations. This finding can appear surprising, as trans-effects are often perceived in the literature as "complex". It would be beneficial to note that eQTL studies commonly find single genes influenced by many (putatively) trans-acting loci, however, this is distinct from the findings here. In this experiments, 69 loci influence Ptdh3-YFP expression in trans (which is consistent with the "complex" architecture of trans effects in eQTL studies). What is shown here is that the majority of these trans effects are single loci. These findings highlight the novelty gained from this experiment; the resolution in eQTL studies rarely allows to identify the individual mutations and their number attributed to a trans (or cis) effect. Furthermore, the fact that more than a third of trans effects are associated with more than one SNP is intriguing. If I have interpreted the results correctly, the authors test the effect of individual mutations (which is a notable feat in itself) to conclude that in such cases there are single SNPs that account for the association, the rest being driven to high frequency by linkage disequilibrium. The testing of SNP effects is however done in an individual SNP-basis. This choice leaves the possibility of multiple SNPs acting in association to change expression in trans (i.e. epistatic effects). It would benefit the reader to clarify if it is possible to test two mutations at once to start with, and if it is, what was the reasoning behind not doing so. Further, because I consider this analysis giving novel insight into the genetic architecture of trans-effects not presented in earlier analyses, I'd encourage the authors to elaborate on the possibility of epistatic trans effects.

    The mutagenesis approach in yeast the authors used is very powerful, but it naturally has drawbacks. The regulatory landscape in yeast is arguably simpler compared to e.g. metazoa or plants, in that the cis-regulatory regions are predominantly closely linked to target genes, the genes in majority do not have introns and post-transcriptional regulation of mRNA through e.g. splicing is rare. These features distinguish the systems, as in animals and plants introns are a very prominent source of regulatory elements (close to half of all enhancers are intronic in many animals), and alternative splicing of e.g. transcription factors are known to play major roles in transcriptional regulation. Further, chromatin is a very important layer in metazoan and plant gene regulation. To benefit the general readership, it would be informative to further elaborate on the significance of the findings for researchers studying other organisms. In addition, it would help to clarify what aspects of the differences in the regulatory landscape the authors think are important to distinguish.