ProtRNA: A Protein-derived RNA Language Model by Cross-Modality Transfer Learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Protein language models (PLM), such as the highly successful ESM-2, have proven to be particularly effective. However, language models designed for RNA continue to face challenges. A key question is: can the information derived from PLMs be harnessed and transferred to RNA? To investigate this, a model termed ProtRNA has been developed by cross-modality transfer learning strategy for addressing the challenges posed by RNA’s limited and less conserved sequences. By leveraging the evolutionary and physicochemical information encoded in protein sequences, the ESM-2 model is adapted to processing "low-resource" RNA sequence data. The results show comparable or even superior performance in various RNA downstream tasks, with only 1/8 the trainable parameters and 1/6 the training data employed by other baseline RNA language models. This approach highlights the potential of cross-modality transfer learning in biological language models.