Learning the native-like codons with a 5’UTR and secondary RNA structure aided species-informed transformer model

Qiuyue Hu
Xiaolin Tian
Yuanning Li
Rui Zhou
Zhihao Wang
Jintao Meng
Shen Wang
Jingjing Guo
Weifeng Li
Liangzhen Zheng
Yanjie Wei

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Efficient protein expression across heterologous hosts remains a major challenge in synthetic biology, largely due to species-specific differences in codon usage and regulatory sequence context. A key difficulty lies in reconstructing the codon landscape of the target expression system within a foreign host with a native-like codon preference. To address this, we present TransCodon, a Transformer-based deep learning model that leverages both 5’ untranslated regions (5’UTRs) and coding sequences (CDS), along with explicit species identifiers and RNA secondary structure information, to learn nuanced codon usage patterns across diverse organisms. By incorporating multisource genomic data and modeling sequence dependencies in a masked language modeling paradigm, TransCodon effectively captures both local and global determinants of codon preference. Our experiments demonstrate that integrating species-level information during training significantly improves the model’s ability to predict optimal synonymous codons when considering different evaluation metrics. More importantly it identifies native-like codons with less divergence from natural sequences compared to other methods. Besides, TransCodon could capture more low-frequency codons which are often omitted by other deep learning-based methods. The results thus indicate that TransCodon as a robust codon language model has the potential for generating native-like CDS with high translational efficiency in target hosts.

Version published to 10.1101/2025.07.19.665668 on bioRxiv
Jul 19, 2025

RNALens: Study on 5’ UTR Modeling and Cell-Specificity

This article has 4 authors:
1. Lei Mao
2. Yuanhe Tian
3. Kang-wei Qian
4. Yan Song
This article has no evaluationsLatest version Jul 20, 2025
NeuroTIS+: An Improved Method for Translation Initiation Site Prediction in Full-Length mRNA Sequence via Primary Structural Information

This article has 2 authors:
1. Wenqiu Xiao
2. Chao Wei
This article has no evaluationsLatest version Jul 14, 2025
mRNABench: A curated benchmark for mature mRNA property and function prediction

This article has 13 authors:
1. Ruian (Ian) Shi
2. Taykhoom Dalal
3. Philip Fradkin
4. Divya Koyyalagunta
5. Simran Chhabria
6. Andrew Jung
7. Cyrus Tam
8. Defne Ceyhan
9. Jessica Lin
10. Kaitlin U. Laverty
11. Ilyes Baali
12. Bo Wang
13. Quaid Morris
This article has no evaluationsLatest version Jul 8, 2025

Listed in

Abstract

Article activity feed

Related articles

RNALens: Study on 5’ UTR Modeling and Cell-Specificity

NeuroTIS+: An Improved Method for Translation Initiation Site Prediction in Full-Length mRNA Sequence via Primary Structural Information

mRNABench: A curated benchmark for mature mRNA property and function prediction