Generative AI-based design of hybrid transcriptional activator proteins with new DNA-binding specificity

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Log in to save this article

Abstract

Transcriptional control arises from the specific recognition of promoter DNA by transcription factors (TFs), forming the basis of cellular information processing and gene regulation. In synthetic biology, TF-promoter interactions are assembled into gene circuits to program cellular behaviors. To ensure reliable circuit performance, most synthetic gene circuits rely on well-characterized and orthogonal regulatory parts. This reliance minimizes crosstalk but constrains circuit complexity and information integration. Creating hybrid TFs that combine or interpolate promoter specificities could therefore expand the design space of synthetic regulatory systems. However, it remains unclear whether hybrid functions can be created by mixing amino acid sequences, and how such functional integration could be achieved in a principled manner. Here we show that a variational autoencoder (VAE) trained on LuxR-family DNA-binding domains can generate transcription factors with hybrid and partially novel promoter recognition properties. By sampling intermediate regions of the VAE-learned latent space, we designed hybrid TFs that activate both the lux and las promoters. High-throughput sort-seq assays together with individual in vivo assays revealed that a subset of functional variants exhibited dual-responsive behavior while maintaining sequence-selective DNA recognition. Together, these results provide a data-driven strategy for exploring functional intermediate sequences between closely related proteins.

Article activity feed

  1. Generative AI-based design of hybrid transcriptional activator proteins with new DNA-binding specificity

    This work presents a clear and well-executed demonstration that hybrid transcription factor behavior can emerge from family-local sequence modeling within the LuxR transcription factor family. By restricting the model to a single fold and alignment framework, the approach captures evolutionary covariation that reflects long-standing constraints on protein stability and DNA recognition. The combination of pooled functional assays, randomized promoter libraries, and structural modeling convincingly shows that the designed variants retain structured sequence preferences while exhibiting broadened promoter responsiveness. As a proof of principle, the study illustrates how navigating sequence space within an evolutionarily tolerated region, rather than de novo scaffold or specificity design, can yield compact regulatory behaviors relevant to synthetic circuit engineering.

    From an evolutionary perspective, the success of the LuxR-LasR interpolation suggests that these regulators occupy nearby functional optima connected by a relatively shallow region of viable intermediates. It would be interesting to understand how broadly this navigability extends within the LuxR family. For example, were additional family members explored to assess how increasing sequence or functional divergence, particularly for regulators lying outside the LuxR-LasR-centered hyperspherical sampling region, affects the continuity of interpolation, or whether some regions of the family landscape are more constrained than others?

    The enhanced activity of variants such as 20L and 22L also raises questions about the nature of the hybrid phenotypes being accessed. While randomized promoter assays indicate preserved sequence selectivity, the expanded activation profiles are consistent with relaxed specificity rather than precise dual recognition. It would be interesting to know whether these permissive phenotypes reflect increased DNA occupancy, altered interactions with RNA polymerase, or changes in promoter competition dynamics, and to what extent these effects occur globally. Complementary genome-wide binding measurements, such as ChIP-seq or CUT&RUN, might help characterize this.

    Further, because LuxR-family transcription factors are dimeric and concentration-limited, it may be informative to examine how these hybrids behave when both promoters are present simultaneously. Placing Plux-GFP and Plas-RFP reporters in the same cell could reveal whether apparent dual responsiveness persists under direct competition for a shared transcription factor pool, or whether promoter bias emerges as expression levels vary.

    Finally, because the approach depends on a latent representation that enforces smoothness while retaining family-level covariation, it would be helpful to know how sensitive the observed hybrid behaviors are to choices such as latent dimensionality, KL weighting, or training initialization. This would clarify how robust the interpolation is, and whether similar results would be expected under modest changes to the model.