Refining sequence-to-expression modelling with chromatin accessibility

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Sequence-to-expression models have gained popularity in the past decade, enabling prediction of gene expression from genomic sequence alone. However, these models typically do not take into account chromatin accessibility, a major factor limiting gene regulation. We hypothesized that supplying accessibility as an input feature would allow a sequence-to-expression model to focus on important open regions of the genome. Using single-nucleus multiome RNA- and ATAC-sequencing data, we found that the predictive performance of such an augmented model was significantly greater than that of sequence-only or accessibility-only models with similar architectures. Specifically, its ability to predict the expression of highly variable genes and gene expression in other cell types improved, which we attribute to a reduction in bias originating from lowly variable genes. Moreover, post-hoc analyses revealed that higher attribution scores in the input DNA sequences of the augmented model conformed to accessibility, whereas those in the sequence-only model were scattered. Additionally, we show that fine-tuning a pre-trained sequence-only model with both sequence and accessibility can boost performance even further. We also highlight the importance of sequencing depth in sequence-to-expression prediction. Altogether, the results of this study provide a compelling argument for the inclusion of chromatin accessibility in sequence-to-expression models, as this strategy can be implemented easily and may improve downstream applications.

Article activity feed