A Unified Framework for Model-Informed and Agentic RNA Design

Aidan T. Riley
McKayla Vlasity
Wyatt M. Becicka
Joey Zhuoying Huang
Sicheng Pang
Wilson W. Wong
Mark W. Grinstaff
Alexander A. Green

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

End-to-end, machine-learning-based design of mRNA molecules offers a powerful means to tailor their properties for specific tasks. mRNA expression level, immunogenicity, tissue specificity, stability, and localization strongly depend on sequence, providing a rich set of properties amenable to optimization. Despite this potential, the components of mRNA are governed by distinct grammatical and functional rules that hinder a unified approach to complete mRNA design. While machine learning and generative AI techniques can excel on individual sequence design tasks, out-of-distribution design, where the biological objective shifts substantially from the original training data, remains difficult. Moreover, there is a disconnect between available sequence generation technologies and the diverse body of biological datasets needed to form and test mechanistic hypotheses. In this work, we describe a simple and powerful alteration to integrated gradients (Design by Integrated Gradients or DIGs) that serves as the foundation for several mRNA design tasks and an agentic hypothesis engine, the Structured RNA Evidence Aggregation Module (STREAM), which enables rapid adaptation of this technique to new contexts. Using this framework, we demonstrate complete model-informed mRNA design and reveal the underexplored rules governing the assembly of mRNA components into high-performance transcripts. By linking neural-network-based design to independent datasets, we design complete mRNA sequences in shifted settings, culminating in up to 6-fold increases in intramuscular expression compared to state-of-the-art methods in vivo. Together, DIGs and STREAM enable automated mRNA design in increasingly complex settings.

Version published to 10.1101/2025.06.17.659751 on bioRxiv
Jun 17, 2025

Emergence of Biological Structural Discovery in General-Purpose Language Models

This article has 1 author:
1. Liang Wang
This article has no evaluationsLatest version Jan 8, 2026
A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025
GENERator: A Long-Context Generative Genomic Foundation Model

This article has 18 authors:
1. Qiuyi Li
2. Wei Wu
3. Yuanyuan Zhang
4. Zhihao Zhan
5. Ruipu Chen
6. Mingyang Li
7. Kun Fu
8. Junyan Qi
9. Yongzhou Bao
10. Chao Wang
11. Yiheng Zhu
12. Zhiyun Zhang
13. Jian Tang
14. Fuli Feng
15. Jieping Ye
16. Liu Yuwen
17. Hui Xiong
18. Zheng Wang
This article has no evaluationsLatest version Feb 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Emergence of Biological Structural Discovery in General-Purpose Language Models

A Survey on Efficient Protein Language Models

GENERator: A Long-Context Generative Genomic Foundation Model