The architecture of SARS-CoV-2 transcriptome

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

SARS-CoV-2 is a betacoronavirus that is responsible for the COVID-19 pandemic. The genome of SARS-CoV-2 was reported recently, but its transcriptomic architecture is unknown. Utilizing two complementary sequencing techniques, we here present a high-resolution map of the SARS-CoV-2 transcriptome and epitranscriptome. DNA nanoball sequencing shows that the transcriptome is highly complex owing to numerous recombination events, both canonical and noncanonical. In addition to the genomic RNA and subgenomic RNAs common in all coronaviruses, SARS-CoV-2 produces a large number of transcripts encoding unknown ORFs with fusion, deletion, and/or frameshift. Using nanopore direct RNA sequencing, we further find at least 41 RNA modification sites on viral transcripts, with the most frequent motif being AAGAA. Modified RNAs have shorter poly(A) tails than unmodified RNAs, suggesting a link between the internal modification and the 3′ tail. Functional investigation of the unknown ORFs and RNA modifications discovered in this study will open new directions to our understanding of the life cycle and pathogenicity of SARS-CoV-2.

Highlights

  • We provide a high-resolution map of SARS-CoV-2 transcriptome and epitranscriptome using nanopore direct RNA sequencing and DNA nanoball sequencing.

  • The transcriptome is highly complex owing to numerous recombination events, both canonical and noncanonical.

  • In addition to the genomic and subgenomic RNAs common in all coronaviruses, SARS-CoV-2 produces transcripts encoding unknown ORFs.

  • We discover at least 41 potential RNA modification sites with an AAGAA motif.

Article activity feed

  1. SciScore for 10.1101/2020.03.12.988865: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.
    Cell Line Authenticationnot detected.

    Table 2: Resources

    Experimental Models: Cell Lines
    SentencesResources
    Nanopore direct RNA sequencing: For nanopore sequencing on non-infected and SARS-CoV-2-infected Vero cells, each 4 μg of DNase I (Takara)-treated total RNA in 8 μl was used for library preparation following the manufacturer’s instruction (the Oxford Nanopore DRS protocol, SQK-RNA002) with minor adaptations.
    Vero
    suggested: None
    Software and Algorithms
    SentencesResources
    Templates for in vitro transcription were prepared by PCR (Q5® High-Fidelity DNA Polymerase [NEB]) with virus-specific PCR primers followed by in vitro transcription (MEGAscript™ T7 Transcription Kit [Invitrogen]).
    MEGAscript™
    suggested: None
    The library was loaded on FLO-MIN106D flow cell followed by 42 hours sequencing run on MinION device (Oxford Nanopore Technologies).
    MinION
    suggested: (MinION, RRID:SCR_017985)
    The sequence reads were aligned to the reference sequence database composed of the C. sabaeus genome (ENSEMBL release 99), a SARS-CoV-2 genome, yeast ENO2 cDNA (YHR174W), and human ribosomal DNA complete repeat unit (GenBank U13369.1) using minimap2 2.17 (Li, 2018) with options “-k 13 -x splice -N 32 -un”.
    ENSEMBL
    suggested: (Ensembl, RRID:SCR_002344)
    We used STAR (Dobin et al., 2013) with many switches to completely turn off the penalties of non-canonical eukaryotic splicing: “--outFilterType BySJout -- outFilterMultimapNmax 20 --alignSJoverhangMin 8 --outSJfilterOverhangMin 12 12 12 12 --outSJfilterCountUniqueMin 1 1 1 1 --outSJfilterCountTotalMin 1 1 1 1 --outSJfilterDistToOtherSJmin 0 0 0 0 --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 --scoreGapNoncan -4 --scoreGapATAC -4 --chimOutType WithinBAM HardClip --chimScoreJunctionNonGTAG 0 --alignSJstitchMismatchNmax -1 -1 -1 -1 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000”.
    STAR
    suggested: (STAR, RRID:SCR_015899)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.