Genomic dialects: How amino acid properties and the second codon base shape the informational accents of life

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Codon Usage Bias (CUB) is a fundamental feature of genomic architecture, reflecting a balance between mutational pressure and natural selection. We propose a “genomic dialects” framework, where species-specific CUB profiles represent “informational accents” constrained by biochemical and structural requirements. Utilizing a normalized informational index based on Shannon’s entropy, we analyzed CUB profiles for 18 amino acids across 1,406 species from the three domains of life. Linear models were employed to investigate the relationship between CUB and physicochemical properties, including Saier’s second-codon-base classification, molecular volume, hydrophobicity, aliphatic/aromatic status, and dissociation constants. CUB distributions are highly skewed, with > 52% of values below 0.1, suggesting a near-optimal use of the genetic code’s potential. We demonstrate that amino acid properties significantly influence CUB, with Saier’s classification explaining up to 69% of variance in Archaea and 47% across all taxa. Hydrophobic amino acids ( Q 1 class) consistently exhibit higher average CUB than hydrophilic ones, particularly in microbes. Individual species models reveal extreme correlations; for example, in the alga Chlamydomonas reinhardtii , Saier classes explain > 95% of CUB variance. Finally, we show that CUB-based dendrograms represent phenetic similarity (“genomic accents”) rather than reliable phylogenetic reconstructions, as they rarely coincide with the true Tree of Life. Our findings indicate that the “rules” of genomic dialects are largely anchored in the dual requirements of translational fidelity and protein stability. The observed “informational accents” are proximately governed by the metabolic and genomic machinery under the constraints of the drift-barrier hypothesis. This study provides a robust framework for understanding how the physical realities of amino acids have shaped the evolution of the genetic code’s informational use across the tree of life.

Article activity feed