ERNIE-RNA: An RNA Language Model with Structure-enhanced Representations

Weijie Yin
Zhaoyu Zhang
Liang He
Rui Jiang
Shuo Zhang
Gan Liu
Xuegong Zhang
Tao Qin
Zhen Xie

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

With large amounts of unlabeled RNA sequences data produced by high-throughput sequencing technologies, pre-trained RNA language models have been developed to estimate semantic space of RNA molecules, which facilities the understanding of grammar of RNA language. However, existing RNA language models overlook the impact of structure when modeling the RNA semantic space, resulting in incomplete feature extraction and suboptimal performance across various downstream tasks. In this study, we developed a RNA pre-trained language model named ERNIE-RNA ( E nhanced R eprese n tations with base-pa i ring r e striction for RNA modeling) based on a modified BERT (Bidirectional Encoder Representations from Transformers) by incorporating base-pairing restriction with no MSA (Multiple Sequence Alignment) information. We found that the attention maps from ERNIE-RNA with no fine-tuning are able to capture RNA structure in the zero-shot experiment more precisely than conventional methods such as fine-tuned RNAfold and RNAstructure, suggesting that the ERNIE-RNA can provide comprehensive RNA structural representations. Furthermore, ERNIE-RNA achieved SOTA (state-of-the-art) performance after fine-tuning for various downstream tasks, including RNA structural and functional predictions. In summary, our ERNIE-RNA model provides general features which can be widely and effectively applied in various subsequent research tasks. Our results indicate that introducing key knowledge-based prior information in the BERT framework may be a useful strategy to enhance the performance of other language models.

Version published to 10.1101/2024.03.17.585376 on bioRxiv
Mar 17, 2024

In-Context Learning in Genomic Language Models as a Biological Evaluation Task

This article has 2 authors:
1. Aadit Kapoor
2. Wendy Lee
This article has no evaluationsLatest version Dec 9, 2025
Explicit Dynamic Cross-Strand Interactions for DNA Sequence Language Modeling

This article has 12 authors:
1. Xiao Luo
2. Cheng Yang
3. Yuansheng Liu
4. Lei Ling
5. Fengxin Li
6. Changjian Chen
7. Long Wang
8. Feng Yu
9. Liang Qiao
10. Xiangxiang Zeng
11. Kenli Li
12. Alexander Schönhuth
This article has no evaluationsLatest version Jan 8, 2026
Emergence of Biological Structural Discovery in General-Purpose Language Models

This article has 1 author:
1. Liang Wang
This article has no evaluationsLatest version Jan 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

In-Context Learning in Genomic Language Models as a Biological Evaluation Task

Explicit Dynamic Cross-Strand Interactions for DNA Sequence Language Modeling

Emergence of Biological Structural Discovery in General-Purpose Language Models