Refining sequence-to-expression modelling with chromatin accessibility
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation: Sequence-to-expression models typically do not consider chromatin accessibility, a major factor limiting gene regulation. We hypothesized that supplying accessibility as an input feature would allow a sequence-to-expression model to focus on important open regions of the genome. Results: We found that the performance of such an augmented model was significantly better than that of sequence-only or accessibility-only models with similar architectures. Specifically, its ability to predict the expression of highly variable genes and gene expression in other cell types improved, and higher attribution scores in the input DNA sequences of the augmented model conformed to accessibility, enabling the learning of cell type-specific sequence patterns. Additionally, we show that fine-tuning a pre-trained sequence-only model with both sequence and accessibility can boost performance further and highlight the importance of sequencing depth in sequence-to-expression prediction. Availability and Implementation: Source code is available on GitHub at https://github.com/lapohosorsolya/accessible_seq2exp.