Unambiguous detection of SARS-CoV-2 subgenomic mRNAs with single-cell RNA sequencing

Abstract

Single-cell RNA sequencing (scRNA-Seq) studies have provided critical insight into the pathogenesis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of coronavirus disease 2019 (COVID-19). scRNA-Seq library preparation methods and data processing workflows are generally designed for the detection and quantification of eukaryotic host mRNAs and not viral RNAs. Here, we compare different scRNA-Seq library preparation methods for their ability to quantify and detect SARS-CoV-2 RNAs with a focus on subgenomic mRNAs (sgmRNAs). We show that compared to 10X Genomics Chromium Next GEM Single Cell 3′ (10X 3′) libraries or 10X Genomics Chromium Next GEM Single Cell V(D)J (10X 5′) libraries sequenced with standard read configurations, 10X 5′ libraries sequenced with an extended length read 1 (R1) that covers both cell barcode and transcript sequence (termed “10X 5′ with extended R1”) increase the number of unambiguous reads spanning leader-sgmRNA junction sites. We further present a data processing workflow, single-cell coronavirus sequencing (scCoVseq), which quantifies reads unambiguously assigned to viral sgmRNAs or viral genomic RNA (gRNA). We find that combining 10X 5′ with extended R1 library preparation/sequencing and scCoVseq data processing maximizes the number of viral UMIs per cell quantified by scRNA-Seq. Corresponding sgmRNA expression levels are highly correlated with expression in matched bulk RNA-Seq data sets quantified with established tools for SARS-CoV-2 analysis. Using this scRNA-Seq approach, we find that SARS-CoV-2 gene expression is highly correlated across individual infected cells, which suggests that the proportion of viral sgmRNAs remains generally consistent throughout infection. Taken together, these results and corresponding data processing workflow enable robust quantification of coronavirus sgmRNA expression at single-cell resolution, thereby supporting high-resolution studies of viral RNA processes in individual cells.

IMPORTANCE

Single-cell RNA sequencing (scRNA-Seq) has emerged as a valuable tool to study host-virus interactions, especially for coronavirus disease 2019 (COVID-19). Here we compare the performance of different scRNA-Seq library preparation methods and sequencing strategies to detect SARS-CoV-2 RNAs and develop a data processing workflow to quantify unambiguous sequence reads derived from SARS-CoV-2 genomic RNA and subgenomic mRNAs. After establishing a workflow that maximizes the detection of SARS-CoV-2 subgenomic mRNAs, we explore patterns of SARS-CoV-2 gene expression across cells with variable levels of total viral RNA, assess host gene expression differences between infected and bystander cells, and identify non-canonical and lowly abundant SARS-CoV-2 RNAs. The sequencing and data processing strategies developed here can enhance studies of coronavirus RNA biology at single-cell resolution and thereby contribute to our understanding of viral pathogenesis.

SciScore for 10.1101/2021.11.22.469642: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Antibodies
Sentences	Resources
SARS-CoV nucleocapsid (N) antibody (clone 1C7C7) (kindly provided by Thomas Moran, Icahn School of Medicine at Mount Sinai, New York, NY), conjugated to AlexaFluor 647 was diluted 1:400 in perm-wash buffer, and added directly to samples.	SARS-CoV nucleocapsid ( N suggested: None
Blocked coverslips were incubated with mouse anti-SARS-CoV N antibody (clone 1C7, 1:500 in 4% BSA PBS) overnight at 4C, washed three times with PBS, and incubated for 45 minutes with 1:500 AlexaFluor 488-conjugated anti-mouse (Invitrogen A11001, 1:500 in 4% BSA PBS) plus DAPI (Thermo Fisher Scientific D1306, 1:1000 in 4% BSA PBS) …

SciScore for 10.1101/2021.11.22.469642: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Antibodies
Sentences	Resources
SARS-CoV nucleocapsid (N) antibody (clone 1C7C7) (kindly provided by Thomas Moran, Icahn School of Medicine at Mount Sinai, New York, NY), conjugated to AlexaFluor 647 was diluted 1:400 in perm-wash buffer, and added directly to samples.	SARS-CoV nucleocapsid ( N suggested: None
Blocked coverslips were incubated with mouse anti-SARS-CoV N antibody (clone 1C7, 1:500 in 4% BSA PBS) overnight at 4C, washed three times with PBS, and incubated for 45 minutes with 1:500 AlexaFluor 488-conjugated anti-mouse (Invitrogen A11001, 1:500 in 4% BSA PBS) plus DAPI (Thermo Fisher Scientific D1306, 1:1000 in 4% BSA PBS) at room temperature.	anti-SARS-CoV N suggested: None anti-mouse suggested: (Thermo Fisher Scientific Cat# A-11001, RRID:AB_2534069)
Experimental Models: Cell Lines
Sentences	Resources
All SARS-CoV-2 propagations and experiments were performed in a Biosafety Level 3 facility in compliance with institutional protocols and federal guidelines. scRNAseq: For scRNAseq experiments, Vero-E6 cells in 6 well plates were infected with SARS-CoV-2 at a MOI of 0.1, or with an equivalent volume of control media, in reduced-serum media (2% FBS) for 24 hours.	Vero-E6 suggested: None
Immunofluorescence microscopy: Vero E6 were seeded in 6-well plates (Falcon REF-353046) with one coverslip (Fisher Scientific 12-550-143) per well.	Vero E6 suggested: None
Software and Algorithms
Sentences	Resources
Fastq files for 5′ libraries sequenced with the extended R1 strategy were generated using bcl2fastq v2.20.0 (Illumina, Inc)	bcl2fastq suggested: (bcl2fastq , RRID:SCR_015058)
, containing cDNA sequence, using a customized Python/3.7.3 script (available at github link pending) as follows.	Python/3.7.3 suggested: None
These were downloaded from the UCSC Genome Browser Table Browser(47) after filtering for TRS-dependent transcripts and score > 900 and exporting to gtf format.	UCSC Genome Browser suggested: (UCSC Genome Browser, RRID:SCR_005780)
This SARS-CoV-2 reference was appended to the host ChlSab1.1 Ensembl reference. scCoVseq: To unambiguously assign and quantify scRNAseq reads to SARS-CoV-2 RNAs, the cellranger output BAM was filtered for reads mapping to SARS-CoV-2 or ChlSab1.1 references using samtools (version 1.11)(48).	Ensembl suggested: (Ensembl, RRID:SCR_002344) samtools suggested: (SAMTOOLS, RRID:SCR_002105)
Scaled SARS-CoV-2 UMI expression of 600 sampled cells were clustered with five methods (k means clustering, hierarchical/Ward clustering, DIANA, mixture model-based clustering, and k medoids clustering) using the clValid (version 0.7) package(53).	clValid suggested: (clValid , RRID:SCR_014626)
Differential gene expression was performed with edgeR using a generalized linear model quasi-likelihood F test adapted with a term for gene detection rate(55, 56).	edgeR suggested: (edgeR, RRID:SCR_012802)
Differentially expressed genes with an absolute log2 fold change greater than or equal to 1 and false discovery rate less than 0.05 were considered significant and subject to KEGG enrichment analysis using the KEGG annotations for African Green Monkey as implemented in the edgeR function kegga.	KEGG suggested: (KEGG, RRID:SCR_012773)
Quantification of SARS-CoV-2 sgmRNA Junction Sites: We explored the ability of our extended R1 sequencing to detect SARS-CoV-2 sgmRNA junctions using STARsolo (version 2.7.8a)(57)	STARsolo suggested: (STARsolo, RRID:SCR_021542)
For all viral infections, analysis was performed with FlowJo software (version 10.7.1, Becton Dickinson), excluding cell doublets and debris and gating according to mock infected populations.	FlowJo suggested: (FlowJo, RRID:SCR_008520)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

It should be noted that there are some limitations to our study. With our dataset, we are unable to know the true infection state of a cell processed for scRNAseq, and therefore we cannot assess the true accuracy of our method to classify infected cells. An additional limitation of our method is that quantification of viral genes with scCoVseq is dependent on accurate annotation of viral RNAs. We derived our annotation based on published empirically-defined TRS-dependent RNAs(12), but this does not preclude the existence of other viral RNAs at time points or in cell types not studied. Importantly, we explicitly exclude TRS-independent RNAs from our analyses. Methods such as STARsolo(57) or sequencing 10X libraries with long-read sequencing(61) may allow for detection and quantification of viral RNAs without reference annotation and irrespective of TRSs.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

No conflict of interest statement was detected. If there are no conflicts, we encourage authors to explicit state so.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Unambiguous detection of SARS-CoV-2 subgenomic mRNAs with single-cell RNA sequencing

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

IMPORTANCE

Article activity feed

Rapid Phylogenomic Analysis of Thousands Outbreak‐Causing Viral Genomes Using Covary

Integrated Transcriptomic Analysis Reveals Distinct Immune Response Signatures and Prognostic Biomarkers in SARS-CoV-2 Infection

From Triplex to Quadruplex: Enhancing CDC’s Respiratory qPCR Assay with RSV Detection on Panther Fusion® Open Access™

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

IMPORTANCE

Article activity feed

Related articles

Rapid Phylogenomic Analysis of Thousands Outbreak‐Causing Viral Genomes Using Covary

Integrated Transcriptomic Analysis Reveals Distinct Immune Response Signatures and Prognostic Biomarkers in SARS-CoV-2 Infection

From Triplex to Quadruplex: Enhancing CDC’s Respiratory qPCR Assay with RSV Detection on Panther Fusion® Open Access™