Refining sequence-to-expression modelling with chromatin accessibility
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Sequence-to-expression models have gained popularity in the past decade, enabling prediction of gene expression from genomic sequence alone. However, these models typically do not take into account chromatin accessibility, a major factor limiting gene regulation. We hypothesized that supplying accessibility as an input feature would allow a sequence-to-expression model to focus on important open regions of the genome. Using single-nucleus multiome RNA- and ATAC-sequencing data, we found that the predictive performance of such an augmented model was significantly greater than that of sequence-only or accessibility-only models with similar architectures. Specifically, its ability to predict the expression of highly variable genes and gene expression in other cell types improved, which we attribute to a reduction in bias originating from lowly variable genes. Moreover, post-hoc analyses revealed that higher attribution scores in the input DNA sequences of the augmented model conformed to accessibility, whereas those in the sequence-only model were scattered. Additionally, we show that fine-tuning a pre-trained sequence-only model with both sequence and accessibility can boost performance even further. We also highlight the importance of sequencing depth in sequence-to-expression prediction. Altogether, the results of this study provide a compelling argument for the inclusion of chromatin accessibility in sequence-to-expression models, as this strategy can be implemented easily and may improve downstream applications.