NucleicBERT: Deciphering the language of nucleic acids by a large-language model

Utkarsh Upadhyay
Julian Herold
Markus Götz
Alexander Schug

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The vast majority of the human genome comprises non-protein-coding regions whose structural and functional roles remain poorly understood. Many of these regions function through RNA, yet progress in deep learning for RNA has lagged behind proteins because most methods rely on abundant structural labels or evolutionary alignments, both sparse for RNA. To address these challenges, we developed NucleicBERT, a self-supervised masked-language model that learns contextual representations capturing local and distal dependencies without requiring alignments or evolutionary information. Explainable AI analysis reveals that the model clusters RNA types in latent space and attends to structural properties like secondary structure and tertiary contacts, effectively “rediscovering” RNA biology from sequence correlations alone. When fine-tuned for downstream structural and functional tasks, NucleicBERT requires only single sequences, yet surpasses current state-of-the-art RNA models. This alignment-free framework addresses the scarcity of annotated 3D RNA data while providing a rapid, computational complement to experimental techniques. By bridging abundant unlabeled primary sequence corpora with more scarce structural annotations, NucleicBERT advances RNA structure prediction and provides insights into the working of LLMs. NucleicBERT is available at https://github.com/KIT-MBS/NucleicBERT .

Version published to 10.1101/2025.09.02.673754 on bioRxiv
Sep 6, 2025

A fully open structure-guided RNA foundation model for robust structural and functional inference

This article has 9 authors:
1. S. Kevin Zhou
2. Heqin Zhu
3. Ruifeng Li
4. Feng Zhang
5. Fenghe Tang
6. Tong Ye
7. Xin Li
8. Yunjie Gu
9. Peng Xiong
This article has no evaluationsLatest version Aug 29, 2025
A fully-open structure-guided RNA foundation model for robust structural and functional inference

This article has 9 authors:
1. Heqin Zhu
2. Ruifeng Li
3. Feng Zhang
4. Fenghe Tang
5. Tong Ye
6. Xin Li
7. Yunjie Gu
8. Peng Xiong
9. S. Kevin Zhou
This article has no evaluationsLatest version Aug 7, 2025
RESM: Capturing sequence and structure encoding of RNAs by mapped transfer learning from ESM (evolutionary scale modeling) protein language model

This article has 15 authors:
1. Yikun Zhang
2. Hao Zhang
3. Guo-Wei Li
4. He Wang
5. Xing Zhang
6. Xu Hong
7. Tingting Zhang
8. Liangsheng Wen
9. Yu Zhao
10. Jiuhong Jiang
11. Jie Chen
12. Yanjun Chen
13. Liwei Liu
14. Jian Zhan
15. Yaoqi Zhou
This article has no evaluationsLatest version Aug 10, 2025

Listed in

Abstract

Article activity feed

Related articles

A fully open structure-guided RNA foundation model for robust structural and functional inference

A fully-open structure-guided RNA foundation model for robust structural and functional inference

RESM: Capturing sequence and structure encoding of RNAs by mapped transfer learning from ESM (evolutionary scale modeling) protein language model