An end-to-end generalizable deep learning framework to comprehensively analyze transcriptional regulation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Genome annotation currently requires performing dozens of molecular assays in hundreds of cell and tissue samples, an expensive endeavor which is impractical to replicate across all species and conditions of interest. Here, we introduce BioSeq2Seq, a deep learning model that infers cell-type-specific molecular assays widely used for genome annotation by leveraging a tri-modal input: evolutionarily conserved DNA sequence features, together with cell-type-specific transcriptional activity and directionality captured by a single run-on sequencing assay. BioSeq2Seq enables flexible genome annotation tasks through parameterized configurations of input features and output targets, combined with gradient-guided architectural refinement for specific biological objectives. Our model demonstrates high accuracy across four downstream tasks, showing improvements of 10.37% in histone modification prediction, 2.25% in functional element prediction, and 5.02% in gene expression prediction compared to state-of-the-art methods. In transcription factor binding site (TFBS) prediction, it maintains performance comparable to that of leading existing approaches. By achieving competitive performance across tasks with minimal input data, BioSeq2Seq provides an efficient and low-cost alternative for genome annotation. To facilitate broader application, an online prediction service based on BioSeq2Seq is publicly available at https://dreg.dnasequence.org.

Article activity feed