Evolution of binding preferences among whole-genome duplicated transcription factors

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    The authors applied original approaches to explore the evolution of transcription factors following duplication and subsequent divergence among duplicates. This paper should generate broad interest among evolutionary biologists as it addresses the long-standing question of how newly evolved transcription factors acquire new binding specificity. By combining genome editing with high-precision DNA binding profiling, this study provides extensive in vivo data showing how the binding profiles of transcription factor paralog pairs diverge.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Throughout evolution, new transcription factors (TFs) emerge by gene duplication, promoting growth and rewiring of transcriptional networks. How TF duplicates diverge was studied in a few cases only. To provide a genome-scale view, we considered the set of budding yeast TFs classified as whole-genome duplication (WGD)-retained paralogs (~35% of all specific TFs). Using high-resolution profiling, we find that ~60% of paralogs evolved differential binding preferences. We show that this divergence results primarily from variations outside the DNA-binding domains (DBDs), while DBD preferences remain largely conserved. Analysis of non-WGD orthologs revealed uneven splitting of ancestral preferences between duplicates, and the preferential acquiring of new targets by the least conserved paralog (biased neo/sub-functionalization). Interactions between paralogs were rare, and, when present, occurred through weak competition for DNA-binding or dependency between dimer-forming paralogs. We discuss the implications of our findings for the evolutionary design of transcriptional networks.

Article activity feed

  1. Author Response:

    Reviewer #3 (Public Review):

    In their manuscript entitled "Evolution of binding preferences among whole-genome duplicated transcription factors" the authors investigate the evolution of transcription factors following genome duplication including the implication for the evolution of their binding sites.

    The hypothesis tested in this study is very much of interest to a large readership, furthermore the authors have carefully selected approaches to allow them to assess the different aspects of the implications of gene duplication. This includes reciprocal knockouts of paralogues, DNA binding domains swapping.

    One of my major criticism stems with the writing and the figure preparation. The writing is extremely dense making difficult to understand arguments and results. Similarly the figures have many panels at the cost of readability and ease of comprehension. Some of the supplementary figures are better suited for the main manuscript for instance

    We improved figures and presentation. In particular, we subdivide three figures (1, 2 and 4) and improve the labeling and captions of each figure.

    The methods need attention, as for instance the parameters used to identify cut sites and avoid noise are not provided are not explained (NGS data processing).

    We expanded the method section to provide more details on our pre-processing procedures and other aspects of the analysis.

    "Promoter binding quantification" The authors summed the normalized genome coverage over the promoter region of each gene, by doing so the authors remove potential inherent variability between replicates, the authors need to keep replicate separate and thereby gain confidence in their results.

    For all correlations, we now show the mean and standard deviation of the correlations between individual repeats in the main figures. Moreover, we added the individual repeats of pbs z-scores and motif signals in Figure 5 and 6 (new Figure 8 and 9). Of note, the supplementary. Figures already showed the individual repeats for the critical analysis (i.e. correlation matrixes).

    "Relative, gene-specific binding changes upon paralog deletion or DBD swapping" the authors select signal from the 100 strongest bound promoters, why choosing those top 100 and not all promoters?

    This is because we do not expect TFs to bind all promoters but only tens-to-hundreds of them. In the majority of promoters, binding signal is weak, likely reflecting some noise, and changes are irrelevant. Of note and for better consistency, we now define the targets as we did in the other figures (i.e. promoters with a z-score above 3.5 or the 50 top promoters) without any significant changes. Please note that for this analysis we also consider the Top 40 targets of the paralog pair, even if they are not all among a TFs targets. Finally, most of our conclusion are based on correlations, which do account for all promoters.

    Need more info on 2A the authors need to indicate conservative and radical replacement. It is hard to see the impact of the changes highlighted

    We updated the figure to better distinguish between AA changes that do not affect functional properties (charge, hydrophobicity), and those that do.

    "Consistent with our sequence analysis, DBD swapping perturbed binding for three of the four zinc-cluster TFs tested, although in none of these cases was DBD swapping sufficient for swapping promoter preferences (Fig. 2D,E). The fourth TF (Yrr1) remained largely invariant (Pearson's r>0.8) to DBD swapping, as did the eleven additionally tested TFs, taken from six different families. Of note, this invariance to DBD swapping characterizing most TFs was observed not only when comparing promoter preferences, but also when comparing in-vivo preferences to DNA 7-mers (Fig. 2D and fig. S2). We conclude that, for most duplicate pairs, the variations driving divergence". This section is very difficult to read

    We updated and simplified this section: “Consistent with their strong DBD sequence divergence, DBD swapping perturbed promoter binding for three of the four zinc cluster TFs tested. However, in none of these was DBD swapping sufficient for switching promoter preferences to those of the paralog from which the DBD was taken (Figure 4B,C). Further, in all other twelve cases studied, binding preferences remained largely invariant to the swapping of the DBD (Pearson's r>0.8). Of note, this invariance to DBD swapping was also observed when comparing in-vivo 7-mer DNA sequence preferences (Figure 4B and Figure 4-figure supplement 1). We conclude that, for most paralog pairs, the variations driving divergence in promoter binding preferences are located outside the DBDs.”

    Page 6 "The fourth TF (Yrr1)", this transcription factor is not shown on the figures

    We do not refer to this TF explicitly in the text now, and to simplify the figure we only annotate the exemplary DBD-swaps from Figure 2D (new Figure 4B).

    Page 7: "Two TFs completely lost binding signals (Pip2, Hms2)," how much is this impacted by the number of binding sites

    We believe this result is independent of the number of binding sites, since we measure correlation between all promoters. In general, we do not detect a strong correlation (r=0.13) between the number of targets, i.e. promoters with a z-score>=3.5, and the effect of paralog deletion.

    Page 7: "Cooperative interactions were generally minor (e.g. Stp2), as were compensatory interactions (e.g. Pdr3 or Ecm22; Fig. 3C,D). Therefore, strong interactions between TFs paralogs are rare and existing ones tend to increase mutation fragility." Figure 3D highlights gene specific variation upon domain swapping, yet the authors do not explore this

    We agree. This is beyond the scope of the present MS and will be studied in future projects.

    Page 11: "Second, mutations within Rph1's DBD prevented its binding to Gis1-specialized sites, thereby reducing paralog interference (Fig. 6E)." Figure 6D suggest Rph1 to bind upon Gis1 ko, the difference in binding affinity upon swapping is not massive, more variants should also intervene

    Please note that we base our statement not on a comparison of DBD swap against the paralog-deletion strain, but rather on a comparison of DBD-swap against the wild-type background. We believe this is the right comparison since the DBD-swapped was done within the wild-type background. In this comparison, the effect of the DBD-swap is comparable to the effect of the paralog deletion. We now emphasize the actual backgrounds in the figure caption.

  2. Evaluation Summary:

    The authors applied original approaches to explore the evolution of transcription factors following duplication and subsequent divergence among duplicates. This paper should generate broad interest among evolutionary biologists as it addresses the long-standing question of how newly evolved transcription factors acquire new binding specificity. By combining genome editing with high-precision DNA binding profiling, this study provides extensive in vivo data showing how the binding profiles of transcription factor paralog pairs diverge.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    Strengths: This is an elegant study on the systematic assessment of divergence of duplicated genes encoding transcription factors that originated from the whole genome duplication in yeast. It provides novel insight into the evolution of DNA binding preferences, the contribution of different domains in the promoter as well as protein interactions between paralogs. The experimental ChEC-seq datasets are a valuable resource to the community.

    Weaknesses: Transcription factor binding domains do not directly translate into transcriptional output. This represents a limitation of this study as it does not provide any immediate insight into divergence of transcription factors based on the actual transcriptional output.

  4. Reviewer #2 (Public Review):

    Gera, Jonas and colleagues used paralog pairs that derived from a whole genome duplication to investigate how newly evolved transcription factors gain new binding specificity using CRISPR/Cas genome editing and DNA binding profiling using high-throughput sequencing. This approach proved to be useful. They find that paralogous transcription factor pairs typically retained very similar binding profiles and occupied the same or similar sites, with few notable exceptions. Surprisingly, divergence in binding preference was driven mostly by sequence variation outside of the proteins' DNA binding domains. Only one transcription factor family (zinc cluster factors) repeatedly showed some influence of the DNA binding domain on binding preference. They also showed if and how transcription factor paralogs influence DNA binding of their respective partner, additively, cooperatively or competitively. For paralog pairs that did diverge in binding specificity, the authors show that divergence is typically a mix of sub- and neo-functionalization, in a model where pairs often retain some ancestral specificity and in addition gain new binding sites.

    The claims and conclusions of the paper are well supported by a wealth of data. Experiments were targeted strategically at addressing the authors' key questions. The probably most surprising result comes from the observation that binding site divergence is mostly due to variation in sequences outside of the proteins' DNA binding domains. This is all the more surprising as some of the pairs show quite a few binding domain sequence differences. The authors provide strong evidence in favor of this observation by swapping DNA binding domains between paralogs. This shows that the backbone of the protein, and not the DNA binding domain, is mainly responsible for determining where a factor binds. One drawback of this observation is that it is not clear what exactly causes the divergence in binding specificity. It could be possible that it is due to residues in the immediate proximity of the DNA binding domain, which could mean that they influence properties of the binding domain. This would mean that the DNA binding domains play a major role after all, albeit through the work of other residues. Another possibility is that binding specificity is determined by interaction with co-factors. The authors raise the possibility that other protein domains directly access DNA which confers binding specificity. The data presented cannot resolve this and further experiments are needed. However, the observation that swapping the DNA binding domain between paralogs does not lead to a change in binding (in most cases) is a major, unexpected and interesting finding.

    The authors then address the big question of the relative importance of sub- vs neo-functionalization of protein duplicates. They compare the paralogs' binding profiles to orthologous factors from a species that did not undergo the genome duplication, providing an "ancestral" baseline. This comparison shows that one of the paralogs often retained a more similar binding profile to the ancestral state than to the other paralog. In most cases, the paralog that had undergone more protein sequence evolution showed the more divergent binding profile. While paralogs with little diverged binding profiles generally bound to the very same promoters as the "ancestral" ortholog, paralogs with more divergent profiles showed increasingly new binding sites, pointing to neo-functionalization to be the main outcome at high levels of divergence. The caveat here is that many proteins did not diverge much at all with the consequence that the sample size for this analysis is not large. Paralog pairs that did not diverge much at all must be considered "non-divergent" and hence uninformative for this question. This analysis appears to lack a clear definition of when a paralog pair is considered divergent enough to be assessed, which causes the potential issue of conflating non-divergence with sub-functionalization. This potential issue needs to be clarified.

    The authors present another exciting result, where the deletion of one of the paralogs leads to reduction (in two cases ablation) or expansion of the binding profile of the other paralog. The data presented is clear and immediately offers models of how the paralogs interact with each other to result in these binding profile changes. This experiment is insightful but there are some caveats to keep in mind. Few of the paralogs had any effects of deletion of one of the pair, meaning that the sample size is rather small and it is unclear how general this pattern is. In addition, some of the factors might show similar effects upon deletion of other interaction partners. This is irrelevant regarding divergence and interaction of paralog pairs, but it means that this data cannot be used to learn about the importance of binding partners for binding specificity in general terms.

    Overall this paper provides exciting insights based on extensive experimental data that advance our understanding of transcription factor binding preference evolution in both expected and unexpected ways.

  5. Reviewer #3 (Public Review):

    In their manuscript entitled "Evolution of binding preferences among whole-genome duplicated transcription factors" the authors investigate the evolution of transcription factors following genome duplication including the implication for the evolution of their binding sites.

    The hypothesis tested in this study is very much of interest to a large readership, furthermore the authors have carefully selected approaches to allow them to assess the different aspects of the implications of gene duplication. This includes reciprocal knockouts of paralogues, DNA binding domains swapping.

    One of my major criticism stems with the writing and the figure preparation. The writing is extremely dense making difficult to understand arguments and results. Similarly the figures have many panels at the cost of readability and ease of comprehension. Some of the supplementary figures are better suited for the main manuscript for instance

    The methods need attention, as for instance the parameters used to identify cut sites and avoid noise are not provided are not explained (NGS data processing).

    "Promoter binding quantification" The authors summed the normalized genome coverage over the promoter region of each gene, by doing so the authors remove potential inherent variability between replicates, the authors need to keep replicate separate and thereby gain confidence in their results.

    "Relative, gene-specific binding changes upon paralog deletion or DBD swapping" the authors select signal from the 100 strongest bound promoters, why choosing those top 100 and not all promoters?

    Need more info on 2A the authors need to indicate conservative and radical replacement. It is hard to see the impact of the changes highlighted

    "Consistent with our sequence analysis, DBD swapping perturbed binding for three of the four zinc-cluster TFs tested, although in none of these cases was DBD swapping sufficient for swapping promoter preferences (Fig. 2D,E). The fourth TF (Yrr1) remained largely invariant (Pearson's r>0.8) to DBD swapping, as did the eleven additionally tested TFs, taken from six different families. Of note, this invariance to DBD swapping characterizing most TFs was observed not only when comparing promoter preferences, but also when comparing in-vivo preferences to DNA 7-mers (Fig. 2D and fig. S2). We conclude that, for most duplicate pairs, the variations driving divergence". This section is very difficult to read

    Page 6 "The fourth TF (Yrr1)", this transcription factor is not shown on the figures

    Page 7: "Two TFs completely lost binding signals (Pip2, Hms2)," how much is this impacted by the number of binding sites

    Page 7: "Cooperative interactions were generally minor (e.g. Stp2), as were compensatory interactions (e.g. Pdr3 or Ecm22; Fig. 3C,D). Therefore, strong interactions between TFs paralogs are rare and existing ones tend to increase mutation fragility." Figure 3D highlights gene specific variation upon domain swapping, yet the authors do not explore this

    Page 11: "Second, mutations within Rph1's DBD prevented its binding to Gis1-specialized sites, thereby reducing paralog interference (Fig. 6E)." Figure 6D suggest Rph1 to bind upon Gis1 ko, the difference in binding affinity upon swapping is not massive, more variants should also intervene