Accurate de novo transcription unit annotation from run-on and sequencing data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Functional element annotations are critical tools used to provide insight into the molecular processes governing cell development, differentiation, and disease. Run-on and sequencing assays measure the production of nascent RNAs and can provide an effective data source for discovering functional elements. However, the accurate inference of functional elements from run-on sequencing data remains an open problem because the signal is noisy and challenging to model. Here we investigated computational approaches that convert run-on and sequencing data into annotations representing transcription units, including genes and non-coding RNAs. We developed a convolutional neural network, called c onvolutional discovery of g ene a natomy using P RO-seq (CGAP), trained to identify different anatomical features of a transcription unit, which were then stitched together into transcript annotations using a hidden Markov model (HMM). Comparison with existing methods showed a significant performance improvement using our novel CGAP-HMM approach. We developed a voting system that ensembles the top three annotation strategies, resulting in large and significant improvements in transcription unit annotation accuracy over the best performing individual method. Finally, we also report a conditional generative adversarial network (cGAN) as a generative approach to transcription unit annotation that shows promise for further development. Collectively our work provides novel tools for de novo transcription unit annotation from run-on and sequencing data that are accurate enough to be useful in many applications.