mRNABench: A curated benchmark for mature mRNA property and function prediction
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Messenger RNA (mRNA) is central in gene expression, and its half-life, localization, and translation efficiency drive phenotypic diversity in eukaryotic cells. While supervised learning has widely been used to study the mRNA regulatory code, self-supervised foundation models support a wider range of transfer learning tasks. However, the dearth and homogeneity of standardized benchmarks limit efforts to pinpoint the strengths of various models. Here, we present m RNAB ench , a comprehensive benchmarking suite for mature mRNA biology that evaluates the representational quality of mature mRNA embeddings from self-supervised nucleotide foundation models. We curate ten datasets and 59 prediction tasks that broadly capture salient properties of mature mRNA, and assess the performance of 18 families of nucleotide foundation models for a total of 135K experiments. Using these experiments, we study parameter scaling, compositional generalization from learned biological features, and correlations between sequence compressibility and performance. We identify synergies between two self-supervised learning objectives, and pre-train a new Mamba-based model that achieves state-of-the-art performance using 700x fewer parameters. m RNAB ench can be found at: https://github.com/morrislab/mRNABench .