EvoFlow-RNA: Generating and Representing non-coding RNA with a Language Model

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

RNA plays a critical role across numerous biological functions. Recent advances in language modeling show promise with representing RNA, but the possibility of large-scale RNA design and optimization has not been fully explored. We propose EvoFlow-RNA , a bidirectional non-coding RNA language model leveraging a masked discrete diffusion model (MDM) formulation for both generative modeling and representation learning. EvoFlow-RNA bridges the gap between RNA sequence representation and design. It outperforms leading RNA models on three BEACON tasks critical to understanding RNA function, spanning from structure prediction to gene editing. For unconditional generation, it synthesizes diverse RNA sequences with native-like structural and binding properties. Additionally, EvoFlow-RNA can globally redesign aptamer sequences around preserved binding recognition sites with enhanced functionality. Our results demonstrate the effectiveness of EvoFlow-RNA in RNA modeling, highlighting the capability and potential of masked discrete diffusion for both recapitulating and enhancing existing RNAs.

Article activity feed