Genome scale identification of new genes using saturated reporter transposon mutagenesis

Emily C. A. Goodall
Freya Hodges
Weine Kok
Budi Permana
Thom Cuddihy
Zihao Yang
Nicole Kahler
Kenneth Shires
Karthik Pullela
Von Vergel L. Torres
Jessica L. Rooke
Antoine Delhaye
Jean-François Collet
Jack A. Bryant
Brian M. Forde
Matthew Hemm
Ian R. Henderson

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Small or overlapping genes are prevalent across all domains of life but are often overlooked for annotation and function because of challenges in their detection. The advent of high-density mutagenesis and data-mining studies suggest the existence of further coding potential within bacterial genomes. To overcome limitations in existing protein detection methods, we applied a genetics-based approach. We combined transposon insertion sequencing with a translation reporter to identify translated open reading frames throughout the genome at scale, independent of genome annotation. We applied our method to the well-characterised species Escherichia coli and identified ∼200 putative novel protein coding sequences (CDS). These are mostly short CDSs (<50 amino acids) and in some cases highly conserved. We validated the expression of selected CDSs demonstrating the utility of this approach. Despite the extensive study of E. coli , this method revealed proteins that have not been described previously, including proteins that are conserved and neighbour functionally important genes, suggesting significant functional roles of these small proteins. We present this as a complementary method to whole cell proteomics and ribosome trapping for condition-dependent identification of protein CDSs, and as a high-throughput method for testing conditional gene expression. We anticipate this technique will be a starting point for future high-throughput genetics investigations to determine the existence of unannotated genes in multiple bacterial species.

Version published to 10.1101/2024.09.06.611592 on bioRxiv
Sep 6, 2024

Towards a quantitative view of the NLR gene family 4evolution in the genome space

This article has 10 authors:
1. Luzie Wingen
2. Duncan Crosbie
3. Yiheng Hu
4. Eric Kemen
5. Xinyi Liu
6. Marion Müller
7. Niklas Schandry
8. Korbinian Schneeberger
9. Detlef Weigel
10. Aurélien Tellier
This article has no evaluationsLatest version Dec 24, 2025
Genome-Wide Discovery and Characterization of Putative Antimicrobial Resistance-Associated Small Open Reading Frames (sORFs) in the Staphylococcus aureus Pan-Genome

This article has 4 authors:
1. Saad Khan
2. Mehede Hassan Rubel
3. Mahmudul Hasan
4. Juan Philippe Teixeira
This article has no evaluationsLatest version Dec 19, 2025
Adaptive laboratory evolution with ethionine identifies novel genetic determinants for enhanced protein and methionine accumulation in Saccharomyces cerevisiae

This article has 7 authors:
1. Tae Hoon Lee
2. Sang-Hun Do
3. Hyun-Jae Lee
4. Kun-Jae Lee
5. Jonghyeok Shin
6. Yong-Cheol Park
7. Sun-Ki Kim
This article has no evaluationsLatest version Jan 13, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Towards a quantitative view of the NLR gene family 4evolution in the genome space

Genome-Wide Discovery and Characterization of Putative Antimicrobial Resistance-Associated Small Open Reading Frames (sORFs) in the Staphylococcus aureus Pan-Genome

Adaptive laboratory evolution with ethionine identifies novel genetic determinants for enhanced protein and methionine accumulation in Saccharomyces cerevisiae