N terminal wobble base usage determines ribosome loading and thus protein expression fate

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Heterologous expression of proteins and enzymes in prokaryotic hosts such as Escherichia coli and Rhodococcus ruber is central to biomanufacturing and biotechnology, yet a substantial fraction of proteins still fail to express for reasons that remain unknown. To address this long-standing problem, we focused on translation initiation, particularly the N-terminal coding-start region. By constructing large-scale 5′-synonymous codon libraries, we found that mRNA secondary structure alone cannot account for expression-fate of proteins across hosts. Instead, a previously unrecognized factor, designated as N-terminal-specified Incompatible Codons (NICs), were identified that decisively correlate protein-non-expression. Library mining revealed all of the NICs in E. coli , spanning 13 of the 20 amino acids, such as GTC for Val, AAG for Lys, ACC for Thr and CAG for Gln. Further studies showed that these NICs are independent of codon rarity, tRNA abundance, and mRNA folding, but correlated to the wobble-base usages. Ribosome loading experiments demonstrated that NICs act as kinetic barriers that prevent ribosome loading thereby resulting in failed translation from initiation and early elongation to efficient elongation. We further performed deep DIA-based LC–MS/MS analysis on intracellular soluble proteins and quantified protein abundance, obtaining a quantitative snapshot of the endogenous proteome. The top 500 most abundant proteins in E. coli were selected and the N-terminal optimal codons (NOC) were summarized. By integrating both NIC avoidance and NOC preference, we designed a new two-section codon-usage strategy for heterologous protein overexpression in prokaryotes, including a one-to-one (one-amino-acid, one-codon) codon table for 48 bp N-terminus and a mixed table (optimal codon utilization + rare codon substitution) for the subsequent sequence. Using this novel approach, 8 previously non-expressed proteins such as lipase, laccase and cysteine hydrolase, were reversely overexpressed in E. coli . An on-line codon-design tool RiboLoad Codon Optimizer , has been available http://47.86.169.8/, serving for numerous proteins/enzymes overexpression with E. coli and R. ruber hosts. Together, these findings establish a mechanism-grounded codon-usage framework for overcoming translational bottlenecks and enabling efficient heterologous protein expression in prokaryotic hosts.

Article activity feed