Prediction and design of transcriptional repressor domains with large-scale mutational scans and deep learning

Raeline Valbuena
AkshatKumar Nigam
Josh Tycko
Peter Suzuki
Kaitlyn Spees
Aradhana
Sophia Arana
Peter Du
Roshni A. Patel
Lacramiora Bintu
Anshul Kundaje
Michael C. Bassik

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Regulatory proteins have evolved diverse repressor domains (RDs) to enable precise context-specific repression of transcription. However, our understanding of how sequence variation impacts the functional activity of RDs is limited. To address this gap, we generated a high-throughput mutational scanning dataset measuring the repressor activity of 115,000 variant sequences spanning more than 50 RDs in human cells. We identified thousands of clinical variants with loss or gain of repressor function, including TWIST1 HLH variants associated with Saethre-Chotzen syndrome and MECP2 domain variants associated with Rett syndrome. We also leveraged these data to annotate short linear interacting motifs (SLiMs) that are critical for repression in disordered RDs. Then, we designed a deep learning model called TENet ( T ranscriptional E ffector Net work) that integrates sequence, structure and biochemical representations of sequence variants to accurately predict repressor activity. We systematically tested generalization within and across domains with varying homology using the mutational scanning dataset. Finally, we employed TENet within a directed evolution sequence editing framework to tune the activity of both structured and disordered RDs and experimentally test thousands of designs. Our work highlights critical considerations for future dataset design and model training strategies to improve functional variant prioritization and precision design of synthetic regulatory proteins.

Version published to 10.1101/2024.09.21.614253 on bioRxiv
Sep 24, 2024

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026
Artificial Intelligence–Driven Structural Mining Enables Functional Inference in the Human Dark Proteome

This article has 7 authors:
1. Valentina Carbonari
2. Annamaria Defilippo
3. Ugo Lomoio
4. Caterina Francesca Perri
5. Barbara Puccio
6. Pierangelo Veltri
7. Pietro Hiram Guzzi
This article has no evaluationsLatest version Dec 23, 2025
Causal splicing variants revealed by deep-learning integration of single-cell sQTL mapping under influenza infection

This article has 8 authors:
1. Liuyang Wang
2. Guinevere Connelly
3. Trisha Dalapati
4. Angela Jones
5. Benjamin Schott
6. Joseph Trimarco
7. Nicholas Heaton
8. Dennis Ko
This article has no evaluationsLatest version Jan 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

Artificial Intelligence–Driven Structural Mining Enables Functional Inference in the Human Dark Proteome

Causal splicing variants revealed by deep-learning integration of single-cell sQTL mapping under influenza infection