An end-to-end generalizable deep learning framework to comprehensively analyze transcriptional regulation

Zhong Wang
Charles Danko
Zhaoxi Zhang
Xiaoya FAN
Jiaxin Zhong
Lijuan Jia
Yuanyuan Han
Chenyi Yang
Zengyou He
Xiaoyan Li
Shing-Tung Yau
Rongling Wu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Genome annotation currently requires performing dozens of molecular assays in hundreds of cell and tissue samples, an expensive endeavor which is impractical to replicate across all species and conditions of interest. Here, we introduce BioSeq2Seq, a deep learning model that infers cell-type-specific molecular assays widely used for genome annotation by leveraging a tri-modal input: evolutionarily conserved DNA sequence features, together with cell-type-specific transcriptional activity and directionality captured by a single run-on sequencing assay. BioSeq2Seq enables flexible genome annotation tasks through parameterized configurations of input features and output targets, combined with gradient-guided architectural refinement for specific biological objectives. Our model demonstrates high accuracy across four downstream tasks, showing improvements of 10.37% in histone modification prediction, 2.25% in functional element prediction, and 5.02% in gene expression prediction compared to state-of-the-art methods. In transcription factor binding site (TFBS) prediction, it maintains performance comparable to that of leading existing approaches. By achieving competitive performance across tasks with minimal input data, BioSeq2Seq provides an efficient and low-cost alternative for genome annotation. To facilitate broader application, an online prediction service based on BioSeq2Seq is publicly available at https://dreg.dnasequence.org.

Version published to 10.21203/rs.3.rs-7242733/v1 on Research Square
Aug 18, 2025

Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

This article has 2 authors:
1. Xiuwei Zhang
2. Yuqi Cheng
This article has no evaluationsLatest version Dec 10, 2025
Transcriptome Graph Transformer--A Graph Transformer-Based Unsupervised Model for Transcriptome Data Analysis

This article has 3 authors:
1. Teng Long
2. Sachit Satyal
3. Jean Gao
This article has no evaluationsLatest version Jan 9, 2026
Decoupled Representation Learning Improves Generalization in CRISPR Off-Target Prediction

This article has 2 authors:
1. Nyla Bhargava
2. Aditya Goswami
This article has no evaluationsLatest version Jan 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

Transcriptome Graph Transformer--A Graph Transformer-Based Unsupervised Model for Transcriptome Data Analysis

Decoupled Representation Learning Improves Generalization in CRISPR Off-Target Prediction