Soffritto: a deep-learning model for predicting high-resolution replication timing
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Replication Timing (RT) refers to the order by which DNA loci are replicated during S phase. RT is cell-type specific and implicated in cellular processes including transcription, differentiation, and disease. RT is typically quantified genome-wide using two-fraction assays (e.g., Repli-Seq) which sort cells into early and late S phase fractions followed by DNA sequencing yielding a ratio as the RT signal. While two-fraction RT data is widely available in multiple cell lines, it is limited in its ability to capture high-resolution RT features. To address this, high-resolution Repli-Seq, which quantifies RT across 16 fractions, was developed, but it is costly and technically challenging with very limited data generated to date.
Results
Here we developed Soffritto , a deep learning model that predicts high-resolution RT data using two-fraction RT data, histone ChIP-seq data, GC content, and gene density as input. Soffritto is composed of a Long Short Term Memory (LSTM) module and a prediction module. The LSTM module learns long- and short-range interactions between genomic bins while the prediction module is composed of a fully connected layer that outputs a 16-fraction probability vector for each bin using the LSTM module’s embeddings as input. By performing both within cell line and cross cell line training and testing for five human and mouse cell lines, we show that Soffritto is able to capture experimental 16-fraction RT signals with high accuracy and the predicted signals allow detection of high-resolution RT patterns.
Availability
Soffritto is available at https://github.com/ay-lab/Soffritto .