Selective State Space Models Outperform Transformers at Predicting RNA-Seq Read Coverage

Ian Holmes
Johannes Linder
David Kelley

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Transformers are the basis for many state-of-the-art machine learning tools, including those for predicting gene expression data from DNA sequence. The considerable time and cost of training transformer models has motivated development of alternative approaches inspired by ideas from the signal-processing literature, such as state-space models (Mamba), Fourier transforms (Hyena), and wavelet transforms (MultiResNet). To evaluate these methods as potential replacements (or complements) for attention, we developed a software library bilby, implemented using Python and Jax/Flax, providing convolutional, attention, bidirectional Hyena, bidirectional Mamba, and striped-architecture models for supervised multi-task learning in functional genomics. We report a comparison of these architectures, testing several hyperparameters and variations, and reporting performance statistics for the withheld test set as well as downstream SNP classifiers. Relative to models comprising convolution and attention layers (implemented in Python and TensorFlow via the Baskerville library used by the Borzoi software), models comprising convolutional, bidirectional Mamba, and (optionally) attention layers achieve small but consistent improvements in prediction accuracy, for roughly comparable training times and parameter counts, when averaged across all output tracks and data splits (a proportional increase of 3-4% in Pearson R, and 1-2% in r ² , with the highest gains achieved when Mamba and attention layers were combined in a striped architecture). In contrast, Hyena (when reimplemented as described in the literature) was not competitive with attention-based models at these tasks, while MultiResNet proved too slow to be practical. The gains in prediction accuracy of the Mamba-based models do not yet translate to significantly improved performance on downstream SNP classification tasks: benchmarks using a GTEx eQTL dataset yield roughly similar results for Mamba- and attention-based classifiers, with attention marginally outperforming Mamba in one metric (a difference of +0.007 in area under ROC) and slightly underperforming by another metric (a difference of −0.006 in Spearman rank correlation). We argue that these results suggest selective state-space models (such as Mamba and Striped Mamba) warrant further exploration for functional genomics tasks. Our code and trained models are publicly available at https://github.com/ihh/bilby .

Version published to 10.1101/2025.02.13.638190v1 on bioRxiv
Feb 17, 2025

Venus-MAXWELL: Efficient Learning of Protein-Mutation Stability Landscapes using Protein Language Models

This article has 10 authors:
1. Yuanxi Yu
2. Fan Jiang
3. Xinzhu Ma
4. Liang Zhang
5. Bozitao Zhong
6. Wanli Ouyang
7. Guisheng Fan
8. Huiqun Yu
9. Liang Hong
10. Mingchen Li
This article has no evaluationsLatest version Jun 2, 2025
mRNABench: A curated benchmark for mature mRNA property and function prediction

This article has 13 authors:
1. Ruian (Ian) Shi
2. Taykhoom Dalal
3. Philip Fradkin
4. Divya Koyyalagunta
5. Simran Chhabria
6. Andrew Jung
7. Cyrus Tam
8. Defne Ceyhan
9. Jessica Lin
10. Kaitlin U. Laverty
11. Ilyes Baali
12. Bo Wang
13. Quaid Morris
This article has no evaluationsLatest version Jul 8, 2025
Benchmarking DNA Foundation Models for zero-shot variant effect prediction: the role of context, training, and architecture

This article has 4 authors:
1. Ilaria Alfisi
2. Francesca Ciapi
3. Marta Baragli
4. Alberto Magi
This article has no evaluationsLatest version Jun 19, 2025

Listed in

Abstract

Article activity feed

Related articles

Venus-MAXWELL: Efficient Learning of Protein-Mutation Stability Landscapes using Protein Language Models

mRNABench: A curated benchmark for mature mRNA property and function prediction

Benchmarking DNA Foundation Models for zero-shot variant effect prediction: the role of context, training, and architecture