Principles for the encoding of molecular information in DNA, RNA and protein motifs

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Genomes, transcriptomes and proteomes present binding sites that can be well described as short linear sequence motifs, which are usually recognized by globular protein domains. Motif evolution requires an effective sequence exploration and is fundamentally constrained by the requirement that functional motif instances are discriminated from the large number of non-target sites in the corresponding search spaces. Molecular information theory and energy landscape theory show that the size of the effective sequence space for motifs is simply the square root of the alphabet size of the polymer. Empirical estimations of the effective sequence space using sequence statistics and a combination of motif length, search space size and the number of motif instances in natural organisms support the theoretical predictions for DNA, RNA and protein motifs. We calculate how the requirements for recognition of multiple instances of a motif scales up with the size of the search space. For many organisms, encoding such motif networks requires the appearance of further events such as the use of modified nucleotides/amino acids, oligomerization of the protein recognizer and/or the presence of multiple detectors. We discuss that the functional encoding in nucleotide and protein motifs respond to the same computational principles.

Article activity feed