A Unified Framework for Model-Informed and Agentic RNA Design

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

End-to-end, machine-learning-based design of mRNA molecules offers a powerful means to tailor their properties for specific tasks. mRNA expression level, immunogenicity, tissue specificity, stability, and localization strongly depend on sequence, providing a rich set of properties amenable to optimization. Despite this potential, the components of mRNA are governed by distinct grammatical and functional rules that hinder a unified approach to complete mRNA design. While machine learning and generative AI techniques can excel on individual sequence design tasks, out-of-distribution design, where the biological objective shifts substantially from the original training data, remains difficult. Moreover, there is a disconnect between available sequence generation technologies and the diverse body of biological datasets needed to form and test mechanistic hypotheses. In this work, we describe a simple and powerful alteration to integrated gradients (Design by Integrated Gradients or DIGs) that serves as the foundation for several mRNA design tasks and an agentic hypothesis engine, the Structured RNA Evidence Aggregation Module (STREAM), which enables rapid adaptation of this technique to new contexts. Using this framework, we demonstrate complete model-informed mRNA design and reveal the underexplored rules governing the assembly of mRNA components into high-performance transcripts. By linking neural-network-based design to independent datasets, we design complete mRNA sequences in shifted settings, culminating in up to 6-fold increases in intramuscular expression compared to state-of-the-art methods in vivo. Together, DIGs and STREAM enable automated mRNA design in increasingly complex settings.

Article activity feed