A revised genome annotation of the model cyanobacterium Synechocystis based on start and stop codon-enriched ribosome profiling and proteogenomics
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Cyanobacteria are important primary producers and are used in biotechnology as microbial cell factories due to their ability to use solar light for oxygenic photosynthesis. Synechocystis sp. PCC 6803 is a popular model cyanobacterium, yet there are ambiguities in the precise coding regions of many genes, and numerous genes encoding small proteins have remained undetected. Here we present the results of a ribosome profiling (Ribo-seq) analysis involving inhibitors that stall ribosomes at translation initiation and termination sites (TIS- and TTS-Ribo-seq), combined with a proteogenomic reevaluation and reannotation of its entire genome. We report evidence for the translation of 3,050 annotated genes based on proteogenomics (83%), of 3,492 based on Ribo-seq (95.2%), and of 3,009 supported by both methods (82%). The data suggested both novel protein-coding genes and corrections for annotated ones. We validated 15 novel small proteins translated from antisense RNAs, from intergenic and intragenic regions and identified 69 novel, mostly small proteins based on proteogenomics. With slr0489, slr1079 and slr1082 we identified three genes with ~300 nt long intragenic out-of-frame coding regions and show that both the internal and host reading frames are translated. The resulting proteins interact with each other, resembling certain defense or toxin-antitoxin systems. Our data illustrate the enormous value of consolidating genome annotations in the context of integrated experimental data and suggest that genome annotations in general need to be extended and revised. All of our data can be accessed via an intuitive and interactive genome browser platform at https://www.bioinf.uni-freiburg.de/~ribobase/.