Trans-NanoSim characterizes and simulates nanopore RNA-sequencing data

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

Compared with second-generation sequencing technologies, third-generation single-molecule RNA sequencing has unprecedented advantages; the long reads it generates facilitate isoform-level transcript characterization. In particular, the Oxford Nanopore Technology sequencing platforms have become more popular in recent years owing to their relatively high affordability and portability compared with other third-generation sequencing technologies. To aid the development of analytical tools that leverage the power of this technology, simulated data provide a cost-effective solution with ground truth. However, a nanopore sequence simulator targeting transcriptomic data is not available yet.

Findings

We introduce Trans-NanoSim, a tool that simulates reads with technical and transcriptome-specific features learnt from nanopore RNA-sequncing data. We comprehensively benchmarked Trans-NanoSim on direct RNA and complementary DNA datasets describing human and mouse transcriptomes. Through comparison against other nanopore read simulators, we show the unique advantage and robustness of Trans-NanoSim in capturing the characteristics of nanopore complementary DNA and direct RNA reads.

Conclusions

As a cost-effective alternative to sequencing real transcriptomes, Trans-NanoSim will facilitate the rapid development of analytical tools for nanopore RNA-sequencing data. Trans-NanoSim and its pre-trained models are freely accessible at https://github.com/bcgsc/NanoSim.

Article activity feed

  1. Now published in GigaScience doi: 10.1093/gigascience/giaa061

    Saber Hafezqorani 1Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, Canada2Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, CanadaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Saber HafezqoraniChen Yang 1Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, Canada2Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, CanadaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteKa Ming Nip 1Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, Canada2Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, CanadaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Ka Ming NipRené L Warren 1Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, CanadaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for René L WarrenInanc Birol 1Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, Canada3Department of Medical Genetics, University of British Columbia, Vancouver, BC, CanadaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Inanc BirolFor correspondence: ibirol@bcgsc.ca

    A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giaa061 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

    These peer reviews were as follows:

    Reviewer 1: http://dx.doi.org/10.5524/REVIEW.102272 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.102273