Exploration of the structural and functional diversity in the metamorphic RfaH subfamily

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Log in to save this article

Abstract

RfaH, a member of the universally conserved NusG/Spt5 family, activates transcription and translation of virulence genes by bridging the transcribing RNA polymerase to the translating ribosome. Unlike its monomorphic, constitutively active paralog NusG, RfaH fold-switches from an autoinhibited state, wherein its C-terminal domain forms an α-helical hairpin bound to the N-terminal domain, into an active NusG-like state through C-terminal domain dissociation and refolding into a β-barrel. RfaH has been proposed to evolve from NusG via gene duplication by first activating genes adjacent to its production site, before developing autoinhibition to selectively control distant genes. Using AlphaFold2, we predicted the structures of thousands of RfaH homologs and identified a subset (∼14%) that appear predominantly folded in the active state. In vivo assays confirmed that these putative monomorphic homologs exhibit constitutive activity comparable to known RfaH mutants. Phylogenetic and genomic analyses revealed that these monomorphic proteins form a distinct clade and are preferentially located within or next to long virulence-associated operons. Together, these results further support a stepwise evolutionary model of RfaH specialization through structural transformation.

Article activity feed

  1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

    Learn more at Review Commons


    Reply to the reviewers

    RESPONSE TO REVIEWERS

    Reviewer #1 (Evidence, reproducibility and clarity (Required)):

    Summary:

    This is an interesting and ambitious study by Tabilo-Agurto and co-workers. It combines deep learning structure prediction (AlphaFold2), targeted molecular dynamics simulations, and in vivo functional assays to probe structural, functional, and evolutionary aspects of the metamorphic protein RfaH. More broadly, the work addresses an important question: whether intermediate structural states may exist along evolutionary trajectories of metamorphic proteins. A particular strength of the study is the integration of computational and experimental approaches. The manuscript is generally well written and clearly organized.

    Major comments:

    A key aspect of the study is the classification of predicted structures into three classes based on the conformation of the C-terminal domain (CTD): the autoinhibited alpha-helical fold, a beta-barrel fold, and a mixed alpha/beta fold. These classes are further described as corresponding to metamorphic (alpha fold), mixed alpha/beta, and monomorphic (beta fold) proteins.

    While I can see how this organizational scheme is helpful in some respects, it may also overstate what can be concluded from the data. As the authors are well aware, AlphaFold2 tends to predict a single conformation even for genuine metamorphic proteins, and therefore does not, on its own, distinguish between monomorphic and fold-switching proteins. I note in particular that the functional data indicates that the "monomorphic" variants studied in the in vivo assays behave similarly to the RfaH E48A mutant. However, E48A is known to remain metamorphic, populating both alpha and beta folds with roughly equal probability. This suggests that the sequences in this class may retain some degree of fold-switching capability, even if the underlying regulatory mechanism differs from that of wild-type RfaH. In other words, the presented data does not fully support these sequences as monomorphic. I am not suggesting that the authors must revise their classification scheme. However, it may strengthen the manuscript if the authors explicitly acknowledge this alternative interpretation and moderate the corresponding claims.

    We appreciate the comment from the reviewer, which can be seen from two different perspectives.

    On the one hand, it might be reasonable to think that the ‘monomorphic’ RfaH orthologs have lower transcription elongation activity than E. coli RfaH. Other highly divergent orthologs of E. coli RfaH (Salmonella enterica serovar Typhimurium, Klebsiella pneumoniae, Yersinia enterocolitica and Vibrio cholerae) have similar* in vitro* recruitment and pausing at the C45 nucleotide from the ops element, as well as restoring the RfaH-dependent hemolytic activity of E. coli in a strain that lacks chromosomal RfaH to levels similar to the wild-type strain (doi: 10.1128/jb.186.9.2829-2840.2004). However, V. cholerae RfaH (43% sequence identity to E. coli RfaH) exhibits diminished antitermination effects in in vitro transcription assays, better resembling the antitermination levels in the absence of RfaH (doi: 10.1128/jb.186.9.2829-2840.2004), despite this protein also being predicted in the alpha-folded state when using AF2 (10.1016/j.csbj.2022.10.024). A particular observation from the RfaH complementation work is that increasing the concentration of V. cholerae in in vitro transcription assays lessens the transcription elongation effects observed when using concentrations similar to E. coli RfaH. These transcription elongation defects can be extrapolated to potentially similar issues with transcription in vivo and, therefore, luciferase translation in our in vivo translation assays for our ‘monomorphic’ proteins.

    On the other hand, it is possible that these so-called ‘monomorphic proteins’ still populate the alpha-folded state, but that their predominant fold in solution is the one corresponding to the active beta-fold. This can be biophysically tested using circular dichroism to distinguish their alpha or beta propensity, as proposed in a remarkable work from Porter* et al* (10.1038/s41467-022-31532-9).

    In both cases, quantification of the protein titers obtained after attempts of protein purification of the ‘monomorphic’ RfaH orthologs would be required. In this way, we can ascertain whether the differences in activity are due to differences in expression levels and determine if sufficient amounts of stable and well-folded protein can be obtained for these RfaH orthologs, followed by measuring their circular dichroism spectra to ascertain their secondary structure propensity.

    Our current attempts are to recombinantly express these proteins for determining their protein titers and solubility in the supernatant, which will enable us to indirectly ascertain their expression levels, and test those solubly expressed proteins biophysically using circular dichroism experiments. If the circular dichroism experiments prove to be unsuccessful due to problems with the solubility of the purified proteins, we strongly believe that the aforementioned discussion should be included in the manuscript to take into account the limitations of the methods utilized in our work.

    Therefore, we will add the following paragraph in the discussion, while we work on ascertaining the feasibility of the circular dichroism assays:

    “It is worth noting that, in the absence of RBS (Figure 5C-F), the putative monomorphic RfaH orthologs have similar or lower in vivo activity than the E. coli RfaH E48A mutant; a similar mutant (E48S) exhibits a 1:1 equilibrium between the autoinhibited and active states (Burmann et al, 2012). This observation can be partly explained by two factors. First, sequence divergence and expression levels may limit functional compatibility with the host machinery. Highly divergent V. cholerae RfaH ortholog, which shares only 43% sequence identity with E. coli RfaH but is predicted to fold into the autoinhibited state (Artsimovitch & Ramírez-Sarmiento, 2022), maintains both ops-dependent recruitment and hemolysin secretion in the ∆rfaH E. coli strain, yet exhibits transcription elongation defects in vitro, requiring a 5-fold higher concentration than E. coli RfaH to match increased elongation rates of E. coli RNAP (Carter et al, 2004). Low in vivo protein titers or structural mismatches between the monomorphic orthologs and E. coli RNAP may prevent higher luciferase expression relative to the E48A mutant. This limitation is supported by the fact that IPTG-induced overexpression rescues activity when an RBS is present (Figure 5B). Second, these proteins may be predominantly folded in the active state while still transiently populating the autoinhibited state. Confirming this conformational equilibrium would require overexpression and purification of these proteins followed by biophysical assays, such as circular dichroism (Porter et al, 2022).”

    Reviewer #1 (Significance (Required)):

    An intriguing, but speculative, aspect of the study is the finding that some sequences are predicted to adopt a CTD with mixed alpha/beta secondary structure, and that such structures also appear in targeted molecular dynamics simulations. If this idea holds up, it could represent an intermediate along the evolutionary pathway between the alpha-helical and beta-barrel folds of RfaH. Although the evidence is only computational, it is a compelling idea and it would benefit from further investigation.

    It is indeed very compelling, and this is something that we should immediately address in a revised version of our manuscript. We somehow missed an article published in 2025, regarding the study of the structural interconversion of the isolated CTD using NMR, finding at least three intermediate states along the fold-switching pathway of RfaH (doi: 10.1073/pnas.2506441122). One of such intermediate states observed, which is also one of the highest populated ones (~23%), corresponds to an ensemble of largely unfolded structures that include the formation of transient alpha-helix a5 (corresponding to helix a2 in our article) and beta-hairpin (b1/b2) secondary structure elements, which fold to form a compact ensemble of structures in which the beta-hairpin lies on top of the alpha-helix. This is fully consistent with our predictions of a mixed alpha/beta state in full-length.

    We will add this external experimental validation of the mixed alpha/beta secondary structure of the CTD of RfaH in the discussion of our final manuscript:

    “Interestingly, a recent nuclear magnetic resonance spectroscopy study of the E. coli RfaH CTD, aimed to uncover transient states potentially en route of the αCTD interconversion (Cai et al, 2025), described an intermediate state (populated in ~23% of the captured ensembles) in which a β-hairpin formed by β-strands β1-β2 lies on top of a transient α-helix α5 that corresponds to helix α2 in our article. This finding is fully consistent with the mixed α/β CTD structures found both in our TMD simulations and our AF2 predictions of divergent RfaH orthologs.”

    In summary, the work is a valuable contribution to the field of protein fold switching. The combination of computational tools with experimental validation makes it interesting and the results should be of broad interest. The manuscript should be well positioned for publication in a high-impact journal.

    We are very thankful for the reviewer’s comments on our manuscript.

    Reviewer #2 (Evidence, reproducibility and clarity (Required)):

    Summary:

    In their paper "Exploration of the structural and functional diversity in the metamorphic RfaH subfamily," Tabilo-Agurto et al. use AlphaFold2 to predict the structures of ~3,900 RfaH homologs, sort the predicted C-terminal domains into α-helical (autoinhibited), β-barrel (NusG-like), and mixed α/β topologies, and find that about 14% of homologs come out predominantly in the β-barrel state. They then take nine representative homologs and run them through a heterologous *E. coli* DH5α Δ*rfaH* reporter assay. The putative monomorphic candidates behave a lot like the constitutively active E48A variant - active across every ops context and even without an RBS - while the mixed α/β candidates barely show activity. Targeted MD simulations of *E. coli* RfaH, run through AF2Rank, also pulls out the mixed α/β state as its own distinct cluster, hinting that it sits somewhere along the fold-switching transition path.

    This is a genuinely interesting piece of work that pulls together structure prediction, in vivo activity, and genomic context to make a concrete case for extant monomorphic βRfaH proteins - a long-hypothesized but until now unseen intermediate in the proposed stepwise evolution of RfaH from NusG. The experimental design is thoughtful, especially the five-construct ops/RBS matrix, and comparing the monomorphic candidates against the E48A benchmark is a nice touch as a positive control. Overall, I think the paper deserves to be published, but a few things would need shoring up before acceptance.

    Major comments:

    1. The paper would be a lot stronger with at least one biophysical measurement (a CD spectrum, say) on a purified monomorphic candidate. I get that this might be outside the planned scope, but even a single CD trace showing β-rich content for an isolated full-length protein would move the claim from "putative" to "demonstrated."

    We agree with the comment from the reviewer, and as such we are currently attempting to recombinantly express these proteins for determining first their solubility after purification (which will largely determine our ability to characterize them by circular dichroism) and then follow up with circular dichroism experiments if the solubility and protein concentration of these ‘monomorphic’ homologs is sufficient to pursue these experiments. In case this is unfeasible, we will include the solubility analysis in our revised version of the article, as well as a discussion on this topic – and also on the topic of why the activity of the ‘monomorphic’ proteins resembles the E48A mutant of E. coli RfaH that co-exists between two folds – as indicated in our response to the major comment from reviewer #1.

    1. Only nine homologs were tested - three per category. The conclusions about monomorphic behavior generalizing across the whole βRfaH clade are basically resting on three proteins. Bringing in even one or two phylogenetically distant βRfaH candidates would help guard against the possibility that what they're seeing is just a genus-specific quirk. If new experiments aren't on the table, the limitation should at least be called out explicitly in the Discussion.

    We agree with the reviewer that drawing conclusions from a single clade of RfaH could raise concerns about bias, although we must note that the tested putative monomorphic candidates were selected before a phylogenetic tree was constructed. What we propose is to perform a phylogenetic analysis for the InterPro sequences and look at their genomic neighborhood as well, replicating what was done in the manuscript for the Genomic Cluster group. We hope this would provide more compelling evidence that the predictions, phylogeny and gene organization of these extant monomorphic RfaH is distinct from those metamorphic.

    1. The classification thresholds (α > 32.5% / β 30.0% / α

    Thanks to the reviewer for raising this concern. We will perform a sensitivity analysis by slightly nudging the cutoffs by ±5% as recommended by the reviewer and indeed we see minimal changes in the number of structures in each class. We have added a small paragraph indicating this sensitivity test:

    “To determine that these values were adequate for our analysis, we performed a sensitivity test by changing the thresholds by ±5% over the data for all structures predicted from all databases, showing that the predictions of RfaH orthologs with monomorphic CTD and mixed secondary structure in their CTD is robust, and only metamorphic RfaH orthologs were reduced with an increase in uncategorized structures (Supplementary Figure S13)”

    1. The Discussion notes that uncontrolled, ops-independent RfaH recruitment could be lethal, since RfaH outcompetes the much more abundant NusG. But if monomorphic RfaH proteins really are extant and stably maintained in these genomes, there has to be something keeping them from interfering with NusG's essential functions - maybe very low expression, restricted induction, or compensating differences in NusG affinity. The paper would benefit from tackling this directly, even speculatively.

    We agree with the reviewer in this point, and after careful consideration we believe that we did not emphasize this point appropriately in the manuscript. In fact, we included Figure 7 to state our perspective on how RfaH may have evolved but we did not emphasize how this perspective stems from a previous work that we thoroughly discussed in the introduction (doi: 10.1038/emboj.2008.268) and that explicitly states that low solubility of the dissociated NTD and CTD could be a factor imposing this restricted action in cis operons. We have included this in our revised version of the manuscript as follows:

    “Our findings are in line with the previous hypothesis regarding the emergence of RfaH within the universally conserved family of NusG transcription factors (Belogurov et al, 2009). Under that model, a gene duplication event produced an intermediate variant (NusG2 in Figure 7) that lost its Rho-binding capability and acquired a deletion in the NTD that reduced the protein's overall size and remodeled its hydrophobic profile. Crucially, this intermediate retained an exposed, hydrophobic RNAP-binding region, a feature shared by monomorphic RfaH and the ancestor of all RfaH orthologs (NusGSP in Figure 7). This increased hydrophobicity would have reduced solubility, restricting its regulatory activity to the site of synthesis, i.e. in cis. Indeed, when structural alignment is used to identify conserved NTD residues that bind to RNAP, orthologs contain more than 70% hydrophobic residues (Supplementary Figure S11) at those positions. This percentage is much closer to that of RfaH (80%) than NusG (57.14%). The protein only regains solubility and the ability to operate in trans when its CTD refolds into a helical conformation. Ultimately, our results strengthen this evolutionary model by demonstrating that several extant RfaH orthologs appear to resemble this insoluble, cis-acting ancestral state.”

    Minor comments:

    Table 1 should show percentages alongside the raw counts. 7/7 LPS-in-operon for monomorphic candidates is striking, but with n=10, the small denominator really deserves to be flagged.

    We agree with the reviewer in that the higher raw count of metamorphics may undersell the message the article conveys. We added the percentages next to raw counts in Table 1, regarding “Total” and “Next to operon” categories. We also modified the legend as follows:

    “A summary of genomic contexts of RfaH orthologs classified according to the AF2 predictions. The numbers indicate how many rfaH genes are next to an operon and whether the operon contains lipopolysaccharide biosynthesis genes, and the percentages next to them display the relation to the previous category, i.e, “Next to operon”/”Total” and “LPS in operon”/”Next to operon”.”

    In Figure 3, the sequence logo on top is informative - consider adding the number of sequences per dataset to the axis labels so readers can interpret the boxplot widths.

    We believe that this would be rather confusing for the readers, because it is counting all 5 structures predicted by AlphaFold2 for each sequence in each dataset that fit each classification, and thus the same sequence can lead to structures that are monomorphic, metamorphic of have mixed secondary structure in their CTD. Thus, the number of sequences per box plot will be higher than the number of sequences per dataset. For example, one sequence from InterPro can be present in more than one box plot, because different AlphaFold2 models can lead to the prediction of different states from the same sequence. We believe it is less confusing if it is presented as it is.

    There's some redundancy between the Results (pp. 14-17) and the Discussion that could probably be trimmed, particularly the recap of the ops/RBS construct logic.

    Thanks for the recommendation. We reduced this redundancy in the new version of the manuscript, mainly on page 16:

    “The orthologs classified as monomorphic, and thus expected to be constitutively active, exhibited activity across all tested ops contexts, including in the absence of RBS (Figure 5B-F). Notably, their activity levels were comparable to the ops-independent *E. coli *RfaH E48A mutant, in which the key salt bridge at the NTD:CTD interface is disrupted. All monomorphic orthologs were found to lack a few key residues that make contacts to ops DNA in RfaH, as well as the conserved residues in loop 2 that mediate contacts with Rho in NusG (Supplementary Figure 11). This mosaic architecture enables these orthologs to promote the expression of the long *lux *operon even when the RBS is absent. Our study provides the first indirect evidence of putative, constitutively active RfaH proteins, which are predicted to have monomorphic NusG-like fold, in other bacteria.”

    Reviewer #2 (Significance (Required)):

    If the central claim holds up, this is a meaningful contribution to the metamorphic-protein and bacterial-transcription literatures: it identifies what appear to be extant evolutionary "way-stations" in the NusG→RfaH transition, and it does so using a tractable computational pipeline that could be applied to other suspected fold-switch families. The work is timely given the ongoing discussion about how AF2 and its descendants handle conformational heterogeneity. With the strengthening suggested above - particularly any direct biophysical confirmation of a monomorphic candidate - I would expect this to be a well-cited paper in its niche.

    We are very thankful for the reviewer’s comments on our manuscript.

    Reviewer #3 (Evidence, reproducibility and clarity (Required)):

    Summary:

    The manuscript by Tabilo-Agurto et al. uses in silico and experimental methods to elucidate the diversity of the metamorphic RfaH protein family. Of particular note is the sophisticated usage of AlphaFold2 to reconstruct the evolutionary tree of RfaH as well as the in vivo luminescence assays to substantiate the different structural states of the RfaH-CTD. Overall this is a well-written manuscript providing deeper insight into the structural and functional diversity of RfaH proteins, potentially relevant for other metamorphic proteins as well.

    Minor comments:

    1. 3rd paragraph of the introduction: The sentence starting with "To date, ...and nuclear magnetic resonance of these ancestors.." seems incomplete as this reviewer believes the author´s wanted to say "..and structural characterization by nuclear magnetic resonance spectroscopy of these ancestors..."

    Thanks for the attention to these details, we will amend this paragraph appropriately.

    1. 4th paragraph of the introduction: "..., that binds Rho or the ribosome (Mooney et al. 2009b). Whereas this citation is correct for NusG-Rho interactions it does not indicate ribosome binding. The direct interaction of NusG with the ribosome was shown in Burmann et al. Science 2010 and this reference should be added here.

    Thanks for the recommendation, we will include both citations in this section of the manuscript.

    1. More a curiosity question, did the author also test for a subset of the RfaH variants the AlphaFold3 predictions and obtain similar or different results?

    Thanks for the comment. The reason we did not use AlphaFold3 predictions to check on the variability of the results is that there is much more known about the use of AlphaFold2 – and its limitations – regarding their use in the study of metamorphic proteins, whereas a deep understanding of the advantages and limitations of AlphaFold3 for studying metamorphic proteins is still under development.

    Referees cross-commenting:

    Overall there is an agreement among all reviewers that the present MS is an interesting and timely study. The point raised by reviewer 2 to add simple biophysical characterization, if feasible, would be clearly an excellent addition and likely make the MS stronger. In general all three reviewers mainly point to minor changes and additions to improve the MS in a rather short timeframe.

    We indeed agree with this comment, which is why we will commit to attempt the recombinant expression and protein purification of the RfaH orthologs and to perform circular dichroism assays if the solubility of the obtained proteins allows for such experiments to be done.

    Reviewer #3 (Significance (Required)):

    The present MS is an interesting large-scale usage of the AlphaFold2 algorithm to reconstruct the evolutionary tree of the specialized transcription elongation factor RfaH. Revealing a different degree of this evolution in a diverse set of bacterial strains indicating its evolutionary distance from the cognate NusG transcription elongation factor. Of particular note is the experimental verification of the obtained in silico finding by in vivo luminescence approaches.

    We are very thankful for the reviewer’s comments on our manuscript.

  2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #3

    Evidence, reproducibility and clarity

    The manuscript by Tabilo-Agurto et al. uses a in silico and experimental methods to elucidate the diversity of the metamorphic RfaH protein family. Of particular note is the sophisticated usage of AlphaFold2 to reconstruct the evolutionary tree of RfaH as well as the in vivo luminescence assays to substantiate the different structural states of the RfaH-CTD. Overall this is a well-written manuscript providing deeper insight into the structural and functional diversity of RfaH proteins, potentially relevant for other metamorphic proteins as well.

    Minor Points:

    • 3rd paragraph of the introduction: The sentence starting with "To date, ...and nuclear magnetic resonance of these ancestors.." seems incomplete as this reviewer believes the author´s wanted to say "..and structural characterization by nuclear magnetic resonance spectroscopy of these ancestors..."
    • 4th paragraph of the introduction: "..., that binds Rho or the ribosome (Mooney et al. 2009b). Whereas this citation is correct for NusG-Rho interactions it does not indicate ribosome binding. The direct interaction of NusG with the ribosome was shown in Burmann et al. Science 2010 and this reference should be added here.
    • More a curiosity question, did the author also tested for a subset of the RfaH variants the AlphaFold3 predictions and obtained similar of different results?

    Referees cross commenting

    Overall there is an agreement among all reviewers that the present MS is an interesting and timely study. The point raised by reviewer 2 to add simple biophysical characterization, if feasible, would be clearly an excellent addition and likely make the MS stronger. In general all three reviewers mainly point to minor changes and additions to improve the MS in a rahter short timeframe.

    Significance

    The present MS is an intereting large scale uage of the AlphaFold2 algorithm to reconstruct the evolutionary tree of the specialized transcription elongation factir RfaH. Revealing a different degree of this evolution in a diverse set of bacterial strains indicating its evolutionary distance from the cognate NusG transcription elongation factor. Of particular note is the experimental verification of the obtained in silico finding by in vio luminescence approaches.

  3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #2

    Evidence, reproducibility and clarity

    In their paper "Exploration of the structural and functional diversity in the metamorphic RfaH subfamily," Tabilo-Agurto et al. use AlphaFold2 to predict the structures of ~3,900 RfaH homologs, sort the predicted C-terminal domains into α-helical (autoinhibited), β-barrel (NusG-like), and mixed α/β topologies, and find that about 14% of homologs come out predominantly in the β-barrel state. They then take nine representative homologs and run them through a heterologous E. coli DH5α ΔrfaH reporter assay. The putative monomorphic candidates behave a lot like the constitutively active E48A variant - active across every ops context and even without an RBS - while the mixed α/β candidates barely show activity. Targeted MD simulations of E. coli RfaH, run through AF2Rank, also pulls out the mixed α/β state as its own distinct cluster, hinting that it sits somewhere along the fold-switching transition path.

    This is a genuinely interesting piece of work that pulls together structure prediction, in vivo activity, and genomic context to make a concrete case for extant monomorphic βRfaH proteins - a long-hypothesized but until now unseen intermediate in the proposed stepwise evolution of RfaH from NusG. The experimental design is thoughtful, especially the five-construct ops/RBS matrix, and comparing the monomorphic candidates against the E48A benchmark is a nice touch as a positive control. Overall, I think the paper deserves to be published, but a few things would need shoring up before acceptance.

    Major comments

    1. The paper would be a lot stronger with at least one biophysical measurement (a CD spectrum, say) on a purified monomorphic candidate. I get that this might be outside the planned scope, but even a single CD trace showing β-rich content for an isolated full-length protein would move the claim from "putative" to "demonstrated."
    2. Only nine homologs were tested - three per category. The conclusions about monomorphic behavior generalizing across the whole βRfaH clade are basically resting on three proteins. Bringing in even one or two phylogenetically distant βRfaH candidates would help guard against the possibility that what they're seeing is just a genus-specific quirk. If new experiments aren't on the table, the limitation should at least be called out explicitly in the Discussion.
    3. The classification thresholds (α > 32.5% / β < 2.5% for αRfaH; β > 30.0% / α < 2.5% for βRfaH) are described as coming from histogram inspection, but they feel a bit arbitrary as stated. A quick sensitivity analysis - how do the population fractions shift if you nudge the cutoffs {plus minus}5%? - would help reassure the reader.
    4. The Discussion notes that uncontrolled, ops-independent RfaH recruitment could be lethal, since RfaH outcompetes the much more abundant NusG. But if monomorphic RfaH proteins really are extant and stably maintained in these genomes, there has to be something keeping them from interfering with NusG's essential functions - maybe very low expression, restricted induction, or compensating differences in NusG affinity. The paper would benefit from tackling this directly, even speculatively.

    Minor comments

    Table 1 should show percentages alongside the raw counts. 7/7 LPS-in-operon for monomorphic candidates is striking, but with n=10, the small denominator really deserves to be flagged.

    In Figure 3, the sequence logo on top is informative - consider adding the number of sequences per dataset to the axis labels so readers can interpret the boxplot widths.

    There's some redundancy between the Results (pp. 14-17) and the Discussion that could probably be trimmed, particularly the recap of the ops/RBS construct logic.

    Significance

    If the central claim holds up, this is a meaningful contribution to the metamorphic-protein and bacterial-transcription literatures: it identifies what appear to be extant evolutionary "way-stations" in the NusG→RfaH transition, and it does so using a tractable computational pipeline that could be applied to other suspected fold-switch families. The work is timely given the ongoing discussion about how AF2 and its descendants handle conformational heterogeneity. With the strengthening suggested above - particularly any direct biophysical confirmation of a monomorphic candidate - I would expect this to be a well-cited paper in its niche.

  4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #1

    Evidence, reproducibility and clarity

    Summary:

    This is an interesting and ambitious study by Tabilo-Agurto and co-workers. It combines deep learning structure prediction (AlphaFold2), targeted molecular dynamics simulations, and in vivo functional assays to probe structural, functional, and evolutionary aspects of the metamorphic protein RfaH. More broadly, the work addresses an important question: whether intermediate structural states may exist along evolutionary trajectories of metamorphic proteins. A particular strength of the study is the integration of computational and experimental approaches. The manuscript is generally well written and clearly organized.

    Major comment:

    A key aspect of the study is the classification of predicted structures into three classes based on the conformation of the C-terminal domain (CTD): the autoinhibited alpha-helical fold, a beta-barrel fold, and a mixed alpha/beta fold. These classes are further described as corresponding to metamorphic (alpha fold), mixed alpha/beta, and monomorphic (beta fold) proteins.

    While I can see how this organizational scheme is helpful in some respects, it may also overstate what can be concluded from the data. As the authors are well aware, AlphaFold2 tends to predict a single conformation even for genuine metamorphic proteins, and therefore does not, on its own, distinguish between monomorphic and fold-switching proteins. I note in particular that the functional data indicates that the "monomorphic" variants studied in the in vivo assays behave similarly to the RfaH E48A mutant. However, E48A is known to remain metamorphic, populating both alpha and beta folds with roughly equal probability. This suggests that the sequences in this class may retain some degree of fold-switching capability, even if the underlying regulatory mechanism differ from that of wild-type RfaH. In other words, the presented data does not fully support these sequences as monomorphic. I am not suggesting that the authors must revise their classification scheme. However, it may strengthen the manuscript if the authors explicitly acknowledge this alternative interpretation and moderate the corresponding claims.

    Significance

    An intriguing, but speculative, aspect of the study is the finding that some sequences are predicted to adopt a CTD with mixed alpha/beta secondary structure, and that such structures also appear in targeted molecular dynamics simulations. If this idea holds up, it could represent an intermediate along the evolutionary pathway between the alpha-helical and beta-barrel folds of RfaH. Although the evidence is only computational, it is a compelling idea and it would benefit from further investigation.

    In summary, the work is a valuable contribution to the field of protein fold switching. The combination of computational tools with experimental validation makes it interesting and the results should be of broad interest. The manuscript should be well positioned for publication in a high-impact journal.