High-Spatiotemporal-Resolution Nanopore Sequencing of SARS-CoV-2 and Host Cell RNAs

This article has been Reviewed by the following groups

Read the full article

Abstract

Recent studies have disclosed the genome, transcriptome and epigenetic compositions of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the effect of viral infection on gene expression of the host cells. It has been demonstrated that, besides the major canonical transcripts, the viral genome also codes for non-canonical RNA molecules. While the structural characterizations have revealed a detailed transcriptomic architecture of the virus, the kinetic studies provided poor and often misleading results on the dynamics of both the viral and host transcripts due to the low temporal resolution of the infection event and the low virus/cell ratio (MOI=0.1) applied for the infection. In this study, we used direct cDNA and direct RNA nanopore sequencings for the generation of high-coverage, high-temporal-resolution transcriptomic datasets on SARS-CoV-2 and on primate host cells infected with a high virus titer (MOI=5). Sixteen sampling time points ranging from 1 to 96h with a varying time resolution and three biological replicates were used in the experiment for both the infected and the non-infected cells.

Article activity feed

  1. AbstractRecent studies

    This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giac094 ), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    ** Reviewer name: Milad Miladi**

    In this work, Tombacz et al. provide a Nanopore RNA sequencing dataset of SARS-CoV-2 infected cells in several timepoints and sequencing setups. Both direct RNA-seq and cDNA-seq techniques have been utilized, and multiplex barcoded sequencing has been done for combining the samples. The dataset can be helpful to the community, such as for future transcriptomic studies of SARS-CoV-2, especially for studying the infection and expression dynamics. The text is well written and easy to follow. I find this work valuable; however, I can see several limitations in the analysis and representation of the results.

    Notably, the figures and tables representing statistical and biological insights of the data points are underworked, lack clarity, and provide limited information about the experiment. Further visualizations, analysis, and data processing could help to reveal the value and insights from this sequencing experiment.

    Comments: The presentation of reads coverage and lengths in Figs 1 & 2 are elementary, unpolished, and non-informative. Better annotation and labeling in Fig. 1 would be needed. Stacking so many violin plots in Fig 2 does not provide any valuable information and would only misguide. What are the messages of these figures? What do the authors expect the readers to catch from them? As noted, stacking many similar figures does not add further information. The authors may want to consider alternative representations and aggregation of the information, besides or replacing the current plots. For example, in Fig.2, scatter/line plots for the median & 25/75% percentile ranges, with an aggregation of the three replicates in on x-axis position, could help identify potential trends over the time points.

    It is better to start the paper by presenting the current Fig.3 as the first one. This figure is the core of contributions and methodologies, and current Figs 1&2 are logical followups of this step.

    There is a very limited description in the Figure Legends. The reader should be able to understand essential elements of the figures merely based on the Figure and its legend.

    This study does not provide much notable biological insight without demultiplexing the reads of each experimental condition into genomic and subgenomic subsets. Distinguishing the genomic and subgenomic reads and analyzing their relative ratio is essential in this temporal study. Due to the transcription process of coronaviruses, the genomic and subgenomic reads have very different characteristics, such as length distribution and cellular presence. Genomic and subgenomic reads can be reliably identified by their coverage and splicing profiles, for enough long reads. It is essential that the authors further process the data by categorizing the genomic/subgenomic reads and the provide statistics such as read length for each category. It would also be interesting to observe the ratio of genomic vs. subgenomic reads. This is an indicative metric of the infection state of the sample. An active infection has a higher sub-genomic share, while, e.g., a very early infection stage is expected to have a larger portion of genomic reads.

    Page-3: "[..] the nested set of subgenomic RNAs (sgRNAs) mapping to the 3'-third of the viral genome". Is 3'-third a typo? Otherwise, the text is not understandable.

    Page-4: " because after a couple of hours, the virus can initiate a new infection cycle within the noninfected cells." More context and elaboration by citing some references can help to support the authors' claim. A gradual infection of non-infected cells can be assumed. However, "a couple of hours" and "initiate a new infection cycle" need further support in a scientific manuscript. The infection process is fairly gradual, but the wording here infers a sudden transition to infecting other cells only at a particular time point.

    Page-4: "[..]undergo alterations non-infected cells during the propagation therefore, we cannot decide whether the transcriptional changes in infected are due to the effect of the virus or to the time factor of culturing." This can be strong support for why this experiment has been done and for the value of this dataset. I would suggest mentioning this in the abstract to highlight the motivation.

    Page-4: "based studies have revealed a hidden transcriptional complexity in viruses [13,14]" Besides Kim et. al, the first DRS experiments of coronaviruses have not been cited (doi.org/10.1101/gr.247064.118, doi.org/10.1101/2020.07.18.204362, doi.org/10.1101/2020.03.05.976167)

    Table-1: dcDNA is quite an uncommon term. In general, here and elsewhere in the text, insisting on a direct cDNA is a bit misleading. A "direct" cDNA sequencing is still an indirect sequencing of RNA molecules!

    Figs S2 and S3: Please also report the ratio of virus to host reads.

  2. Abstract

    This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giac094 ), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows: ** Reviewer name: George Taiaroa**

    The authors provide a potentially useful dataset relating to transcripts from cultured SARS-CoV-2 material in a commonly used cell line (Vero). Relevant sequence data are publicly available and descriptions on the preparation of these data are for the most part detailed and adequate, although this is lacking at times.

    Although the authors state that this dataset overcomes the limitations of available transcriptomic datasets, I do not believe this to be an accurate statement; based on comparable published work in this cell line, transcriptional activity is expected to peak at approximately one day post infection (Chang et al. 2021, Transcriptional and epi-transcriptional dynamics of SARS-CoV-2 during cellular infection), with the 96 hour period of infection described likely representing overlapping cellular infections of different stages.

    Secondly, many in the field have moved to use more appropriate cell lines in place of the Vero African Monkey kidney cell line, to better reflect changes in transcription during the course of infection in human and/or lung epithelial cells (See Finkel et al. 2020, The coding capacity of SARS-CoV-2). Lastly, the study would ideally be performed with a publicly available SARS-CoV-2 strain, as has been the case for earlier studies of this nature to allow for reproducibility and extension of the work presented by others.

    That said, the data are publicly available and could be of use. Primary comments I think that a statement detailing the ethics approval for this work would be essential, given materials used were collected from posthumously from a patient. Similarly, were these studies performed under appropriate containment, given classifications of SARS-CoV-2 at the time of the study? I do not know what the authors mean in reference to a 'mixed time point sample' for the one direct RNA sample in this study; could this please be clarified? Secondary comments I believe the authors may over-simplify discontinuous extension of minus strands in saying that

    'The gRNA and the sgRNAs have common 3'-termini since the RdRP synthesizes the positive sense RNAs from this end of the genome'. Each of the 5' and 3' sequence of gRNAs/sgRNAs are shared through this process of replication. 'Infections are typically carried out using fresh, rapidly growing cells, and fresh cultures are also used as mock-infected cells.However, gene expression profiles may undergo alterations non-infected cells during the propagation therefore, we cannot decide whether the transcriptional changes in infected are due to the effect of the virus or to the time factor of culturing. This phenomenon is practically never tested in the experiments.' I do not follow what these sentences are referring to. 'Altogether, we generated almost 64 million long-reads, from which more than 1.8 million reads mapped to the SARS-CoV-2 and almost 48 million to the host reference genome, respectively (Table 1).

    The obtained read count resulted in a very high coverage across the viral genome (Figure 1). Detailed data on the read counts, quality of reads including read lengths (Figure 2), insertions, deletions, as well as mismatches are summarized Supplementary Tables.' Could this perhaps be more appropriately placed in the data analysis section, rather than background?

  3. SciScore for 10.1101/2021.08.20.457128: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Ethicsnot detected.
    Sex as a biological variableCollection, detection and isolation of the virus: The SARS-CoV-2 virus was isolated from the human nasopharyngeal swab of the RT-PCR positive (Ct 22) 77-year-old male patient during the official COVID-19 surveillance program at the Veterinary Diagnostic Directorate of the National Food Chain Safety Office (Budapest, Hungary) with the cooperation of the Complex Medical Center (Budapest) in November 2020 at the second wave of COVID-19 pandemic in Hungary.
    RandomizationThe following programs were also used for analysis: SamTools20 [for generation binary alignment (bam) and indexed (bai) files, as well as to categorize the data into viral-mapped, host-mapped and unmapped], the bamCoverage tool from deepTools21 (to generate coverage tracks), GATK22 Picard’s DownsampleSam tool which applies a downsampling algorithm to retain only a deterministically random subset of the reads.
    Blindingnot detected.
    Power Analysisnot detected.
    Cell Line Authenticationnot detected.

    Table 2: Resources

    Experimental Models: Cell Lines
    SentencesResources
    Vero cells were incubated at 37°C in a humidified 5% CO2 atmosphere until confluency (~8 × 106 cells) was reached.
    Vero
    suggested: None
    The virus was passaged twice at low MOI in Vero E6 cells to obtain a working stock used in the experiments.
    Vero E6
    suggested: RRID:CVCL_XD71)
    Software and Algorithms
    SentencesResources
    Pre-processing and data analysis: The MinION raw data was basecalled using ONT Guppy basecalling software version 5.0.11. using -- qscore_filtering.
    MinION
    suggested: (MinION, RRID:SCR_017985)
    The following programs were also used for analysis: SamTools20 [for generation binary alignment (bam) and indexed (bai) files, as well as to categorize the data into viral-mapped, host-mapped and unmapped], the bamCoverage tool from deepTools21 (to generate coverage tracks), GATK22 Picard’s DownsampleSam tool which applies a downsampling algorithm to retain only a deterministically random subset of the reads.
    GATK22
    suggested: None
    Picard’s
    suggested: None

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.