Principles for the encoding of molecular information in DNA, RNA and protein motifs

Ezequiel Alejandro Galpern
Inés Bauer
Diego Ulises Ferreiro
Ignacio Enrique Sanchez

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Genomes, transcriptomes and proteomes present binding sites that can be well described as short linear sequence motifs, which are usually recognized by globular protein domains. Motif evolution requires an effective sequence exploration and is fundamentally constrained by the requirement that functional motif instances are discriminated from the large number of non-target sites in the corresponding search spaces. Molecular information theory and energy landscape theory show that the size of the effective sequence space for motifs is simply the square root of the alphabet size of the polymer. Empirical estimations of the effective sequence space using sequence statistics and a combination of motif length, search space size and the number of motif instances in natural organisms support the theoretical predictions for DNA, RNA and protein motifs. We calculate how the requirements for recognition of multiple instances of a motif scales up with the size of the search space. For many organisms, encoding such motif networks requires the appearance of further events such as the use of modified nucleotides/amino acids, oligomerization of the protein recognizer and/or the presence of multiple detectors. We discuss that the functional encoding in nucleotide and protein motifs respond to the same computational principles.

Version published to 10.21203/rs.3.rs-9404382/v1 on Research Square
Apr 15, 2026

Learning the structural diversity in random protein sequence space

This article has 9 authors:
1. Filip Buchel
2. Tereza Neuwirthova
3. Theodora Tureckiova
4. Gustavo Fuertes
5. Ales Benda
6. Dalibor Panek
7. Matus Fricek
8. Mohammed AlQuraishi
9. Klara Hlouchova
This article has no evaluationsLatest version May 5, 2026
Residue burial encodes a protein’s fold

This article has 3 authors:
1. Alex T. Grigas
2. Jacob Sumner
3. Corey S. O’Hern
This article has no evaluationsLatest version Mar 31, 2026
Informational blueprints reveal condition-dependent gene regulatory architectures

This article has 7 authors:
1. Doruk Efe Gökmen
2. Rosalind Wenshan Pan
3. Tom Röschinger
4. Stephen Quake
5. Hernan G Garcia
6. Rob Phillips
7. Vincenzo Vitelli
This article has no evaluationsLatest version May 20, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Learning the structural diversity in random protein sequence space

Residue burial encodes a protein’s fold

Informational blueprints reveal condition-dependent gene regulatory architectures