Soffritto: a deep-learning model for predicting high-resolution replication timing

Dante Bolzan
Ferhat Ay

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Motivation

Replication Timing (RT) refers to the order by which DNA loci are replicated during S phase. RT is cell-type specific and implicated in cellular processes including transcription, differentiation, and disease. RT is typically quantified genome-wide using two-fraction assays (e.g., Repli-Seq) which sort cells into early and late S phase fractions followed by DNA sequencing yielding a ratio as the RT signal. While two-fraction RT data is widely available in multiple cell lines, it is limited in its ability to capture high-resolution RT features. To address this, high-resolution Repli-Seq, which quantifies RT across 16 fractions, was developed, but it is costly and technically challenging with very limited data generated to date.

Results

Here we developed Soffritto , a deep learning model that predicts high-resolution RT data using two-fraction RT data, histone ChIP-seq data, GC content, and gene density as input. Soffritto is composed of a Long Short Term Memory (LSTM) module and a prediction module. The LSTM module learns long- and short-range interactions between genomic bins while the prediction module is composed of a fully connected layer that outputs a 16-fraction probability vector for each bin using the LSTM module’s embeddings as input. By performing both within cell line and cross cell line training and testing for five human and mouse cell lines, we show that Soffritto is able to capture experimental 16-fraction RT signals with high accuracy and the predicted signals allow detection of high-resolution RT patterns.

Availability

Soffritto is available at https://github.com/ay-lab/Soffritto .

Version published to 10.1101/2025.01.23.634644v1 on bioRxiv
Jan 26, 2025

Iterative improvement of deep learning models using synthetic regulatory genomics

This article has 2 authors:
1. André M. Ribeiro-dos-Santos
2. Matthew T. Maurano
This article has no evaluationsLatest version Feb 21, 2025
scValue: value-based subsampling of large-scale single-cell transcriptomic data for machine and deep learning tasks

This article has 3 authors:
1. Li Huang
2. Weikang Gong
3. Dongsheng Chen
This article has no evaluationsLatest version Jan 27, 2025
Integrate and generate single-cell proteomics from transcriptomics with cross-attention

This article has 5 authors:
1. Jiankang Xiong
2. Shuqiao Zheng
3. Fuzhou Gong
4. Liang Ma
5. Lin Wan
This article has no evaluationsLatest version Jan 30, 2025

Listed in

Abstract

Motivation

Results

Availability

Article activity feed

Related articles

Iterative improvement of deep learning models using synthetic regulatory genomics

scValue: value-based subsampling of large-scale single-cell transcriptomic data for machine and deep learning tasks

Integrate and generate single-cell proteomics from transcriptomics with cross-attention