Pervasive translation of short open reading frames and de novo gene emergence in Arabidopsis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Ancestrally non-genic sequences are now widely recognized as potential reservoirs for the de novo emergence of new genes. Across clades, some of these de novo genes were proven to have substantial phenotypic effects, and to contribute to the emergence of novel biological functions. Yet, still very little is known about the starting material from which de novo genes emerge, especially in plants. To fill this gap, we produced Ribosome Profiling data in the closely related species Arabidopsis halleri and A. lyrata, that we combined with the model A. thaliana, and compared genome-wide patterns of translation across the three species. Synteny analysis revealed 211 Open Reading Frames (ORFs) that show signs of active translation and that have emerged de novo within the Arabidopsis genus. Most of these de novo translated ORFs were species-specific and neutrally evolving, indicating their transient nature. They were also significantly shorter and less expressed than conserved Coding DNA Sequences (CDS), and their GC content increased with phylogenetic conservation. While most of them were located in intergenic regions, 34 were previously annotated as CDS in at least one genome, and are amongst the most promising gene candidates. Our results illuminate the abundance of translation events outside of conserved CDS, and their role as starting material for the emergence of novel genes.