Multicellular, IVT-derived, unmodified human transcriptome for nanopore-direct RNA analysis

Curation statements for this article:
  • Curated by GigaByte

    GigaByte logo

    Editors Assessment:

    Oxford nanopore direct RNA sequencing (DRS) is a relatively new sequencing technology enabling measurements of RNA modifications. In vitro transcription (IVT)-based negative controls (i.e. modification-free transcripts) are a practical and targeted control for this direct sequencing, providing a baseline measurement for canonical nucleotides within a matched and biologically-derived sequence context. This work presents exactly this type of a long-read, multicellular, poly-A RNA-based, IVT-derived, unmodified transcriptome dataset. Review flagging more statistical analyses needed be performed for the data quality, and this was provided. The resulting data providing a resource to the direct RNA analysis community, helping reduce the need for expensive IVT library preparation and sequencing for human samples. And also serving as a framework for RNA modification analysis in other organisms.

    This evaluation refers to version 1 and 2 of the preprint

This article has been Reviewed by the following groups

Read the full article

Abstract

Nanopore direct RNA sequencing (DRS) enables measurements of RNA modifications. Modification-free transcripts are a practical and targeted control for DRS, providing a baseline measurement for canonical nucleotides within a matched and biologically derived sequence context. However, these controls can be challenging to generate and carry nanopore-specific nuances that can impact analysis. We produced DRS datasets using modification-free transcripts from in vitro transcription (IVT) of cDNA from six immortalized human cell lines. We characterized variation across cell lines and demonstrated how these may be interpreted. These data will serve as a versatile control and resource to the community for RNA modification analysis of human transcripts.

Article activity feed

  1. Editors Assessment:

    Oxford nanopore direct RNA sequencing (DRS) is a relatively new sequencing technology enabling measurements of RNA modifications. In vitro transcription (IVT)-based negative controls (i.e. modification-free transcripts) are a practical and targeted control for this direct sequencing, providing a baseline measurement for canonical nucleotides within a matched and biologically-derived sequence context. This work presents exactly this type of a long-read, multicellular, poly-A RNA-based, IVT-derived, unmodified transcriptome dataset. Review flagging more statistical analyses needed be performed for the data quality, and this was provided. The resulting data providing a resource to the direct RNA analysis community, helping reduce the need for expensive IVT library preparation and sequencing for human samples. And also serving as a framework for RNA modification analysis in other organisms.

    This evaluation refers to version 1 and 2 of the preprint

  2. ABSTRACTNanopore direct RNA sequencing (DRS) enables measurements of RNA modifications. Modification-free transcripts are a practical and targeted control for DRS, providing a baseline measurement for canonical nucleotides within a matched and biologically derived sequence context. However, these controls can be challenging to generate and carry nanopore-specific nuances that can impact analysis. We produced DRS datasets using modification-free transcripts from in vitro transcription (IVT) of cDNA from six immortalized human cell lines. We characterized variation across cell lines and demonstrated how these may be interpreted. These data will serve as a versatile control and resource to the community for RNA modification analysis of human transcripts.

    This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.129), and has published the reviews under the same license. These reviews are as follows:

    Reviewer 1. Joshua Burdick

    Is the language of sufficient quality?

    Yes. In line 284, "bioinformatic" may be more often used than "BioInformatic", but the meaning is clear.

    Are the data and metadata consistent with relevant minimum information or reporting standards?

    Yes. Presumably the files (e.g. eventalign data) which are not in SRA will need to be uploaded to the GigaByte site.

    Is there sufficient detail in the methods and data-processing steps to allow reproduction?

    Yes. Line 177 should presumably be "nanopolish evenetalign".

    Is there sufficient data validation and statistical analyses of data quality?

    Yes. In my opinion, Figure 3(A) nicely illustrates the uncertainty in current nanopore data, which is useful.

    Additional Comments:

    The RNA samples, and nanopore sequencing data, should be useful as a negative control. Sequencing these IVT RNA samples using the newer ONT RNA004 pore and kit might also be useful.

    Reviewer 2. Jiaxu Wang

    Is there sufficient data validation and statistical analyses of data quality?

    No. The authors ran DSR for the in vitro transcribed transcriptional RNAs from 6 cell lines to remove the possible natural modifications. The data can be used as a control RNA pool for natural or artificial modification studies. however, more statistical analyses should be performed for the data quality. see comments below: (1) For more possible usage of this data, some QC analysis is better to be provided to confirm the quality of these sequencing data. For example: 1) What is the correlation between in vitro transcribed transcriptional RNAs and original DSR for each cell line? 2) how many genes have been captured in each cell line? (2) In Figure 2B, the author provides 3 conditions for ‘exclude’ and ‘include’, some statistical analysis should be performed to confirm how many cases in condition 1, condition 2, and condition 3. How many mismatches are showing in only 1 cell line, some cell lines or all the cell lines? The shared correct genes may be more confident references for the modification analysis. (3) Different reads of the same gene could have different mismatches in the IVT RNAs due to RT-PCR bias or other reasons (especially for the lower expressed RNAs), for example, there are 100 reads in total, 90 reads are the correct nucleotide at a given position, 10 reads have a mismatch in the IVT sample, then how to define the signal as the control reference? Given that the nature modification is low in RNA, some threshold should be applied for the confident result, for example, what is the lowest expression threshold that could be used as a confident control reference?

    Is there sufficient information for others to reuse this dataset or integrate it with other data?

    No. For more possible usage of this data, more QC data should be performed, please refer to my above comments.

    Re-review: I am happy to see the changes. Thanks!