Pervasive translation of short open reading frames and de novo gene emergence in Arabidopsis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Ancestrally non-genic sequences are now widely recognized as potential reservoirs for the de novo emergence of new genes. Across clades, some de novo genes were proven to have substantial phenotypic effects, and to contribute to the emergence of novel biological functions. Yet, still very little is known about the starting material from which de novo genes emerge, especially in plants. To fill this gap, we generated Ribosome Profiling data from the closely related species Arabidopsis halleri , A. lyrata and A. thaliana and characterized genome-wide patterns of translation across them. Synteny analysis revealed 211 Open Reading Frames (ORFs) that have emerged de novo within the Arabidopsis genus and already exhibit signs of active translation. Most of these de novo translated ORFs were species- and even accession-specific, indicating their transient nature, with patterns of polymorphism consistent with neutral evolution in natural populations. They were also significantly shorter and less expressed than conserved Coding DNA Sequences (CDS), and their GC content increased with phylogenetic conservation. While most of them were located in intergenic regions and are thus newly discovered, 34 were previously annotated as CDS in at least one genome, and are promising putative genes. Our results demonstrate the abundance of translation events outside of conserved CDS, and their role as starting material for the emergence of novel genes in plants.