codonGPT: Reinforcement learning on a generative language model optimizes RNA sequences under biological constraints

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Emerging generative models for biology focus on DNA, non-coding RNA, or proteins, ignoring information hidden in mRNA. Additionally, in protein engineering and mRNA therapeutics the design of mRNA sequences is still a challenge, lacking a clear framework. Here, we introduce and rigorously evaluate two novel methods: a foundational model for mRNA and a reinforcement learning mRNA design framework built on such a model. codonGPT is the first generative foundational language model trained directly on coding mRNA sequences. To solve the problem of synonymous constraints that are only unique to mRNA, we introduce a novel method of inference-time masking, along with house-keeping genes evaluation. For the first time, we also rigorously demonstrate, that for precise mRNA therapeutics design, reinforcement learning on such a model provides a clear framework for biological optimization. Our study introduces a novel foundational model for mRNA and a new reinforcement learning based paradigm for mRNA sequence design.

Article activity feed