A Unified Framework for Model-Informed and Agentic RNA Design
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
End-to-end, machine-learning-based design of mRNA molecules offers a powerful means to tailor their properties for specific tasks. mRNA expression level, immunogenicity, tissue specificity, stability, and localization strongly depend on sequence, providing a rich set of properties amenable to optimization. Despite this potential, the components of mRNA are governed by distinct grammatical and functional rules that hinder a unified approach to complete mRNA design. While machine learning and generative AI techniques can excel on individual sequence design tasks, out-of-distribution design, where the biological objective shifts substantially from the original training data, remains difficult. Moreover, there is a disconnect between available sequence generation technologies and the diverse body of biological datasets needed to form and test mechanistic hypotheses. In this work, we describe a simple and powerful alteration to integrated gradients (Design by I ntegrated G radients or DIGs) that serves as the foundation for several mRNA design tasks and an agentic hypothesis engine, the St ructured R NA E vidence A ggregation M odule (STREAM), which enables rapid adaptation of this technique to new contexts. Using this framework, we demonstrate complete model-informed mRNA design and reveal the underexplored rules governing the assembly of mRNA components into high-performance transcripts. By linking neural-network-based design to independent datasets, we design complete mRNA sequences in shifted settings, culminating in up to 6-fold increases in intramuscular expression compared to state-of-the-art methods in vivo. Together, DIGs and STREAM enable automated mRNA design in increasingly complex settings.