EvoFlow-RNA: Generating and Representing non-coding RNA with a Language Model
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
RNA plays a critical role across numerous biological functions. Recent advances in language modeling show promise with representing RNA, but the possibility of large-scale RNA design and optimization has not been fully explored. We propose EvoFlow-RNA , a bidirectional non-coding RNA language model leveraging a masked discrete diffusion model (MDM) formulation for both generative modeling and representation learning. EvoFlow-RNA bridges the gap between RNA sequence representation and design. It outperforms leading RNA models on three BEACON tasks critical to understanding RNA function, spanning from structure prediction to gene editing. For unconditional generation, it synthesizes diverse RNA sequences with native-like structural and binding properties. Additionally, EvoFlow-RNA can globally redesign aptamer sequences around preserved binding recognition sites with enhanced functionality. Our results demonstrate the effectiveness of EvoFlow-RNA in RNA modeling, highlighting the capability and potential of masked discrete diffusion for both recapitulating and enhancing existing RNAs.