Yorzoi: Predicting RNA-seq coverage from DNA sequence in yeast

Read the full article See related articles

Discuss this preprint

Start a discussion

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Yeast is one of the principal model organisms in synthetic biology and a widely used chassis for testing heterologous DNA sequences. However, designing DNA constructs correctly is currently limited by our incomplete quantitative understanding of how sequences are transcribed and translated, especially non-native sequences. Here, we present Yorzoi , a sequence-to-expression model for bakers’ yeast that predicts RNA-seq coverage for a 5 kilobase, multi-gene window of DNA at 10 base pair (bp) resolution. To extend its predictive ability beyond native DNA, Yorzoi has been pretrained on a comprehensive dataset not only including native yeast sequences but also human sequences expressed in yeast and structurally rearranged synthetic yeast chromosomes. We demonstrate that our model has learned general rules of yeast transcription by achieving high predictive performance on various downstream tasks. Yorzoi is a powerful tool for in-silico testing of DNA sequences and directly applicable for sequence design. A web application to use our model is available at yorzoi.eu and the code open source on GitHub .

Article activity feed