Modular Deep Learning for Direct RNA Sequence Design via Self-Contained RNA Units

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

RNA sequence design is a pivotal challenge in synthetic biology, yet state-of-the-art deep learning methods face a fundamental bottleneck: the scarcity of high-resolution 3D structures. To compensate for limited training data, existing approaches like NA-MPNN and RiboDiffusion employ computationally expensive autoregressive or iterative diffusion sampling, substantially limiting their throughput and scalability. In this work, we propose that this data limitation is largely a problem of accessibility and granularity. We introduce SCRU-DB, a comprehensive database that systematically decomposes complex RNAs into over 61,000 Self-contained RNA Units (SCRUs). This scale far exceeds previous RNA motif libraries, capturing over 8,200 unique structural clusters. Crucially, SCRUs are rigorously defined as structurally autonomous modules identified via tertiary contact clustering, ensuring they act as self-stabilizing, foldable physical units. Leveraging this massive, modular prior, we present SCRU-Seq (a direct, O(1) prediction GNN) and SCRU-Diff (an iterative diffusion model). On our high-fidelity set112 benchmark, SCRU-Seq achieves a native sequence recovery (NSR) of 63.7%, while SCRU-Diff reaches a superior Best NSR of 79.2%. We demonstrate high structural fidelity via 3D backbone superposition using the C4’ RMSD (reaching 1.5Å for complex targets) and validate the structural isomorphism of our modular fragments. This framework provides a scalable, physically grounded solution for generating diverse and structurally accurate RNA sequences.

Article activity feed