Informational blueprints reveal condition-dependent gene regulatory architectures

Doruk Efe Gökmen
Rosalind Wenshan Pan
Tom Röschinger
Stephen Quake
Hernan G Garcia
Rob Phillips
Vincenzo Vitelli

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

While coding regions in the genome have a direct interpretation in terms of protein products, significant fractions are non-coding and yet control essential biological functions. Unlike the genetic code, there is no “lookup table” that identifies where regulatory proteins, known as transcription factors (TFs), bind. Here, we extract these binding sites by distilling sequences of nucleotide letters into collective coordinates (hyperletters) representing the binding sites that are active under specific environmental conditions. Going beyond local information footprints between individual bases and expression levels, our information blueprint algorithm compresses the global information by optimising filters that simultaneously scan an entire promoter sequence. Inspired by renormalisation-group techniques, we identify TF binding sites as coarse-grained variables combining groups of correlated mutations with the highest collective impact on gene expression. We validate our approach on experimental data for E. coli and discover novel regulatory elements illustrating its deployment at scale across growth conditions.

Version published to 10.64898/2026.05.18.726006 on bioRxiv
May 20, 2026

Partner determination from protein sequences using class information with CLAPP

This article has 5 authors:
1. Lisa Gennai
2. Francesco Caredda
3. Mathieu E. Rebeaud
4. Andrea Pagnani
5. Paolo De Los Rios
This article has no evaluationsLatest version May 11, 2026
Discovering conserved regulatory modules in predicted gene regulatory networks across species

This article has 2 authors:
1. Jingyi Zhang
2. Lenwood S. Heath
This article has no evaluationsLatest version May 16, 2026
MAJEC: unified gene, isoform, and locus-level transposable element quantification from RNA-seq

This article has 2 authors:
1. Tian-Yeh Lim
2. Ari J. Firestone
This article has no evaluationsLatest version Apr 14, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Partner determination from protein sequences using class information with CLAPP

Discovering conserved regulatory modules in predicted gene regulatory networks across species

MAJEC: unified gene, isoform, and locus-level transposable element quantification from RNA-seq