Selective State Space Models Outperform Transformers at Predicting RNA-Seq Read Coverage

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Transformers are the basis for many state-of-the-art machine learning tools, including those for predicting gene expression data from DNA sequence. The considerable time and cost of training transformer models has motivated development of alternative approaches inspired by ideas from the signal-processing literature, such as state-space models (Mamba), Fourier transforms (Hyena), and wavelet transforms (MultiResNet). To evaluate these methods as potential replacements (or complements) for attention, we developed a software library bilby, implemented using Python and Jax/Flax, providing convolutional, attention, bidirectional Hyena, bidirectional Mamba, and striped-architecture models for supervised multi-task learning in functional genomics. We report a comparison of these architectures, testing several hyperparameters and variations, and reporting performance statistics for the withheld test set as well as downstream SNP classifiers. Relative to models comprising convolution and attention layers (implemented in Python and TensorFlow via the Baskerville library used by the Borzoi software), models comprising convolutional, bidirectional Mamba, and (optionally) attention layers achieve small but consistent improvements in prediction accuracy, for roughly comparable training times and parameter counts, when averaged across all output tracks and data splits (a proportional increase of 3-4% in Pearson R, and 1-2% in r 2 , with the highest gains achieved when Mamba and attention layers were combined in a striped architecture). In contrast, Hyena (when reimplemented as described in the literature) was not competitive with attention-based models at these tasks, while MultiResNet proved too slow to be practical. The gains in prediction accuracy of the Mamba-based models do not yet translate to significantly improved performance on downstream SNP classification tasks: benchmarks using a GTEx eQTL dataset yield roughly similar results for Mamba- and attention-based classifiers, with attention marginally outperforming Mamba in one metric (a difference of +0.007 in area under ROC) and slightly underperforming by another metric (a difference of −0.006 in Spearman rank correlation). We argue that these results suggest selective state-space models (such as Mamba and Striped Mamba) warrant further exploration for functional genomics tasks. Our code and trained models are publicly available at https://github.com/ihh/bilby .

Article activity feed