Evaluation of deep learning approaches for high-resolution chromatin accessibility prediction from genomic sequence

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurately predicting high-resolution chromatin accessibility signals is crucial for precisely identifying regulatory elements and understanding their role in gene expression regulation. In the absence of experimental assays, machine learning prediction provides an alternative data source to quantify the effects of specific non-coding mutations, thereby accelerating cancer research. While several deep learning methods have been developed to predict chromatin accessibility from DNA sequences including genetic variants, most of these methods either do it at low resolutions or treat this problem as a classification task making it difficult to study the variant effects on chromatin accessibility. In this work, we rigorously evaluated existing deep learning approaches on their ability to predict ATAC-seq signal with the 4bp resolution and assessed the robustness of their predictions. We further introduced a new class of deep learning architectures – ConvNextCNNs, ConvNextLSTMs, ConvNextDCNNs, and ConvNextTransformers, that use ConvNeXt stem to effectively extract genomic features from the DNA sequences. These models outperform the existing methods for predicting high-resolution ATAC-seq signals when compared using data from 2 healthy cell lines, 2 cancer cell lines, and 4 cancer patients, resulting in a diverse experimental setup. Moreover, our study utilized patient-specific data from tumor TCGA samples to analyze the methods’ ability to capture changes in chromatin accessibility caused by patient-specific single-nucleotide variants. Based on their predictive accuracy, robustness and ability to predict the effects of single-nucleotide mutations, we observe that ConvNextDCNNs perform better than the other methods. This extensive study opens the door for utilizing these patient-specific deep learning approaches to understand the regulatory landscape alterations caused by specific mutations.

Article activity feed