EvoFlow-RNA: Generating and Representing non-coding RNA with a Language Model
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
RNA plays a critical role across numerous biological functions. Recent advances in language modeling show promise with representing RNA, but the possibility of large-scale RNA design and optimization not been fully explored. We propose EvoFlow-RNA , a bidirectional non-coding RNA language model leveraging a masked discrete diffusion model (MDM) formulation for both generative modeling and representation learning. EvoFlow-RNA bridges the gap between RNA sequence representation and design. It outperforms leading RNA models on six BEACON tasks critical to understanding RNA function, such as secondary structure prediction. For unconditional generation, it synthesizes diverse RNA sequences with native-like biophysical properties. Furthermore, EvoFlow-RNA can optimize aptamer sequences while preserving binding recognition sites. Our results demonstrate the effectiveness of EvoFlow-RNA in RNA modeling, highlighting the capability and potential of masked discrete diffusion for both recapitulating and enhancing existing RNAs.