Protein language models reveal evolutionary constraints on synonymous codon choice

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Evolution has shaped the genetic code, with subtle pressures leading to preferences for some synonymous codons over others. Codons are translated at different speeds by the ribosome, imposing constraints on codon choice related to the process of translation. The structure and function of a protein may impose pressure to translate the associated mRNA at a particular speed in order to enable proper protein production, but the molecular basis and scope of these evolutionary constraints have remained elusive. Here, we show that information about codon constraints can be extracted from protein sequence alone. We leverage a protein language model to predict codon choice from amino acid sequence, combining implicit information about position and protein structure to learn subtle but generalizable constraints on codon choice in yeast. In parallel, we conduct a genome-wide screen of thousands of synonymous codon substitutions in endogenous loci in yeast, reliably identifying a small set of several hundred synonymous variants that increase or decrease fitness while showing that most positions have no measurable effect on growth. Our results suggest that cotranslational localization and translational accuracy, more than cotranslational protein folding, are major drivers of selective pressure on codon choice in eukaryotes. By considering both the small but wide-reaching effects of codon choice that can be learned from evolution and the strong but highly specific effects determined via experiment, we expose unappreciated biological constraints on codon choice.

Article activity feed