Nanopore guided annotation of transcriptome architectures

Jonathan S. Abebe
Yasmine Alwie
Erik Fuhrmann
Jonas Leins
Julia Mai
Ruth Verstraten
Sabrina Schreiner
Angus C. Wilson
Daniel P. Depledge

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Nanopore direct RNA sequencing (DRS) enables the capture and full-length sequencing of native RNAs, without recoding or amplification bias. Resulting data sets may be interrogated to define the identity and location of chemically modified ribonucleotides, as well as the length of poly(A) tails, on individual RNA molecules. The success of these analyses is highly dependent on the provision of high-resolution transcriptome annotations in combination with workflows that minimize misalignments and other analysis artifacts. Existing software solutions for generating high-resolution transcriptome annotations are poorly suited to small gene-dense genomes of viruses due to the challenge of identifying distinct transcript isoforms where alternative splicing and overlapping RNAs are prevalent. To resolve this, we identified key characteristics of DRS data sets that inform resulting read alignments and developed the nanopore guided annotation of transcriptome architectures (NAGATA) software package ( https://github.com/DepledgeLab/NAGATA ). We demonstrate, using a combination of synthetic and original DRS data sets derived from adenoviruses, herpesviruses, coronaviruses, and human cells, that NAGATA outperforms existing transcriptome annotation software and yields a consistently high level of precision and recall when reconstructing both gene sparse and gene-dense transcriptomes. Finally, we apply NAGATA to generate the first high-resolution transcriptome annotation of the neglected pathogen human adenovirus type F41 (HAdV-41) for which we identify 77 distinct transcripts encoding at least 23 different proteins.

IMPORTANCE

The transcriptome of an organism denotes the full repertoire of encoded RNAs that may be expressed. This is critical to understanding the biology of an organism and for accurate transcriptomic and epitranscriptomic-based analyses. Annotating transcriptomes remains a complex task, particularly in small gene-dense organisms such as viruses which maximize their coding capacity through overlapping RNAs. To resolve this, we have developed a new software nanopore guided annotation of transcriptome architectures (NAGATA) which utilizes nanopore direct RNA sequencing (DRS) datasets to rapidly produce high-resolution transcriptome annotations for diverse viruses and other organisms.

Version published to 10.1128/msystems.00505-24
Jul 23, 2024
Version published to 10.1101/2024.04.02.587744 on bioRxiv
Apr 3, 2024

Discuss this preprint

Listed in

Abstract

IMPORTANCE

Article activity feed