ProtmRNA: Cross-Modal Knowledge Transfer from Proteins to Messenger RNA
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
According to the central dogma of molecular biology, messenger RNA (mRNA) sequences are directly translated into amino acid sequences, positioning mRNA as the fundamental intermediary between genetic information and functional proteins. This natural correspondence suggests that mRNA sequence analysis could greatly benefit from the rich evolutionary and functional representations learned by large-scale protein language models.
Results
ProtmRNA repurposes the pre-trained ESM-2 protein language model for mRNA sequence processing via cross-modal transfer learning. Evaluated on mRNA- and protein-related datasets, along with eight additional benchmarks compiled in this study, ProtmRNA achieves performance comparable or superior to state-of-the-art mRNA language models while using less than half the pre-training computational resources. This work establishes the potential of cross-modal transfer learning between biological sequences by demonstrating that protein-derived knowledge can be efficiently transferred to mRNA, offering a resource-efficient paradigm for advancing mRNA sequence understanding.
Availability and Implementation
The pre-trained ProtmRNA model and the eight CDS-region regression benchmarks curated in this study are publicly available at https://github.com/pesenteur/ProtmRNA .