Unambiguous detection of SARS-CoV-2 subgenomic mRNAs with single-cell RNA sequencing

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Single-cell RNA sequencing (scRNA-Seq) studies have provided critical insight into the pathogenesis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of coronavirus disease 2019 (COVID-19). scRNA-Seq library preparation methods and data processing workflows are generally designed for the detection and quantification of eukaryotic host mRNAs and not viral RNAs. Here, we compare different scRNA-Seq library preparation methods for their ability to quantify and detect SARS-CoV-2 RNAs with a focus on subgenomic mRNAs (sgmRNAs). We show that compared to 10X Genomics Chromium Next GEM Single Cell 3′ (10X 3′) libraries or 10X Genomics Chromium Next GEM Single Cell V(D)J (10X 5′) libraries sequenced with standard read configurations, 10X 5′ libraries sequenced with an extended length read 1 (R1) that covers both cell barcode and transcript sequence (termed “10X 5′ with extended R1”) increase the number of unambiguous reads spanning leader-sgmRNA junction sites. We further present a data processing workflow, single-cell coronavirus sequencing (scCoVseq), which quantifies reads unambiguously assigned to viral sgmRNAs or viral genomic RNA (gRNA). We find that combining 10X 5′ with extended R1 library preparation/sequencing and scCoVseq data processing maximizes the number of viral UMIs per cell quantified by scRNA-Seq. Corresponding sgmRNA expression levels are highly correlated with expression in matched bulk RNA-Seq data sets quantified with established tools for SARS-CoV-2 analysis. Using this scRNA-Seq approach, we find that SARS-CoV-2 gene expression is highly correlated across individual infected cells, which suggests that the proportion of viral sgmRNAs remains generally consistent throughout infection. Taken together, these results and corresponding data processing workflow enable robust quantification of coronavirus sgmRNA expression at single-cell resolution, thereby supporting high-resolution studies of viral RNA processes in individual cells.

IMPORTANCE

Single-cell RNA sequencing (scRNA-Seq) has emerged as a valuable tool to study host-virus interactions, especially for coronavirus disease 2019 (COVID-19). Here we compare the performance of different scRNA-Seq library preparation methods and sequencing strategies to detect SARS-CoV-2 RNAs and develop a data processing workflow to quantify unambiguous sequence reads derived from SARS-CoV-2 genomic RNA and subgenomic mRNAs. After establishing a workflow that maximizes the detection of SARS-CoV-2 subgenomic mRNAs, we explore patterns of SARS-CoV-2 gene expression across cells with variable levels of total viral RNA, assess host gene expression differences between infected and bystander cells, and identify non-canonical and lowly abundant SARS-CoV-2 RNAs. The sequencing and data processing strategies developed here can enhance studies of coronavirus RNA biology at single-cell resolution and thereby contribute to our understanding of viral pathogenesis.

Article activity feed

  1. SciScore for 10.1101/2021.11.22.469642: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Antibodies
    SentencesResources
    SARS-CoV nucleocapsid (N) antibody (clone 1C7C7) (kindly provided by Thomas Moran, Icahn School of Medicine at Mount Sinai, New York, NY), conjugated to AlexaFluor 647 was diluted 1:400 in perm-wash buffer, and added directly to samples.
    SARS-CoV nucleocapsid ( N
    suggested: None
    Blocked coverslips were incubated with mouse anti-SARS-CoV N antibody (clone 1C7, 1:500 in 4% BSA PBS) overnight at 4C, washed three times with PBS, and incubated for 45 minutes with 1:500 AlexaFluor 488-conjugated anti-mouse (Invitrogen A11001, 1:500 in 4% BSA PBS) plus DAPI (Thermo Fisher Scientific D1306, 1:1000 in 4% BSA PBS) at room temperature.
    anti-SARS-CoV N
    suggested: None
    anti-mouse
    suggested: (Thermo Fisher Scientific Cat# A-11001, RRID:AB_2534069)
    Experimental Models: Cell Lines
    SentencesResources
    All SARS-CoV-2 propagations and experiments were performed in a Biosafety Level 3 facility in compliance with institutional protocols and federal guidelines. scRNAseq: For scRNAseq experiments, Vero-E6 cells in 6 well plates were infected with SARS-CoV-2 at a MOI of 0.1, or with an equivalent volume of control media, in reduced-serum media (2% FBS) for 24 hours.
    Vero-E6
    suggested: None
    Immunofluorescence microscopy: Vero E6 were seeded in 6-well plates (Falcon REF-353046) with one coverslip (Fisher Scientific 12-550-143) per well.
    Vero E6
    suggested: None
    Software and Algorithms
    SentencesResources
    Fastq files for 5′ libraries sequenced with the extended R1 strategy were generated using bcl2fastq v2.20.0 (Illumina, Inc)
    bcl2fastq
    suggested: (bcl2fastq , RRID:SCR_015058)
    , containing cDNA sequence, using a customized Python/3.7.3 script (available at github link pending) as follows.
    Python/3.7.3
    suggested: None
    These were downloaded from the UCSC Genome Browser Table Browser(47) after filtering for TRS-dependent transcripts and score > 900 and exporting to gtf format.
    UCSC Genome Browser
    suggested: (UCSC Genome Browser, RRID:SCR_005780)
    This SARS-CoV-2 reference was appended to the host ChlSab1.1 Ensembl reference. scCoVseq: To unambiguously assign and quantify scRNAseq reads to SARS-CoV-2 RNAs, the cellranger output BAM was filtered for reads mapping to SARS-CoV-2 or ChlSab1.1 references using samtools (version 1.11)(48).
    Ensembl
    suggested: (Ensembl, RRID:SCR_002344)
    samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    Scaled SARS-CoV-2 UMI expression of 600 sampled cells were clustered with five methods (k means clustering, hierarchical/Ward clustering, DIANA, mixture model-based clustering, and k medoids clustering) using the clValid (version 0.7) package(53).
    clValid
    suggested: (clValid , RRID:SCR_014626)
    Differential gene expression was performed with edgeR using a generalized linear model quasi-likelihood F test adapted with a term for gene detection rate(55, 56).
    edgeR
    suggested: (edgeR, RRID:SCR_012802)
    Differentially expressed genes with an absolute log2 fold change greater than or equal to 1 and false discovery rate less than 0.05 were considered significant and subject to KEGG enrichment analysis using the KEGG annotations for African Green Monkey as implemented in the edgeR function kegga.
    KEGG
    suggested: (KEGG, RRID:SCR_012773)
    Quantification of SARS-CoV-2 sgmRNA Junction Sites: We explored the ability of our extended R1 sequencing to detect SARS-CoV-2 sgmRNA junctions using STARsolo (version 2.7.8a)(57)
    STARsolo
    suggested: (STARsolo, RRID:SCR_021542)
    For all viral infections, analysis was performed with FlowJo software (version 10.7.1, Becton Dickinson), excluding cell doublets and debris and gating according to mock infected populations.
    FlowJo
    suggested: (FlowJo, RRID:SCR_008520)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    It should be noted that there are some limitations to our study. With our dataset, we are unable to know the true infection state of a cell processed for scRNAseq, and therefore we cannot assess the true accuracy of our method to classify infected cells. An additional limitation of our method is that quantification of viral genes with scCoVseq is dependent on accurate annotation of viral RNAs. We derived our annotation based on published empirically-defined TRS-dependent RNAs(12), but this does not preclude the existence of other viral RNAs at time points or in cell types not studied. Importantly, we explicitly exclude TRS-independent RNAs from our analyses. Methods such as STARsolo(57) or sequencing 10X libraries with long-read sequencing(61) may allow for detection and quantification of viral RNAs without reference annotation and irrespective of TRSs.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • No conflict of interest statement was detected. If there are no conflicts, we encourage authors to explicit state so.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.