Secondary analysis of transcriptomes of SARS-CoV-2 infection models to characterize COVID-19

Abstract

Knowledge about the molecular mechanisms driving COVID-19 pathophysiology and outcomes is still limited. To learn more about COVID-19 pathophysiology we performed secondary analyses of transcriptomic data from two in vitro (Calu-3 and Vero E6 cells) and one in vivo (Ad5-hACE2-sensitized mice) models of SARS-CoV-2 infection. We found 1467 conserved differentially expressed host genes (differentially expressed in at least two of the three model system transcriptomes compared) in SARS-CoV-2 infection. To find potential genetic factors associated with COVID-19, we analyzed these conserved differentially expressed genes using known human genotype-phenotype associations. Genome-wide association study enrichment analysis showed evidence of enrichment for GWA loci associated with platelet functions, blood pressure, body mass index, respiratory functions, and neurodegenerative and neuropsychiatric diseases, among others. Since human protein complexes are known to be directly related to viral infection, we combined and analyzed the conserved transcriptomic signature with SARS-CoV-2-host protein-protein interaction data and found more than 150 gene clusters. Of these, 29 clusters (with 5 or more genes in each cluster) had at least one gene encoding protein that interacts with SARS-CoV-2 proteome. These clusters were enriched for different cell types in lung including epithelial, endothelial, and immune cell types suggesting their pathophysiological relevancy to COVID-19. Finally, pathway analysis on the conserved differentially expressed genes and gene clusters showed alterations in several pathways and biological processes that could enable in understanding or hypothesizing molecular signatures inducing pathophysiological changes, risks, or sequelae of COVID-19.

SciScore for 10.1101/2020.08.27.270835: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Experimental Models: Cell Lines
Sentences	Resources
The second transcriptome signature is based on mRNA profiles of control (mock-infected) and 24h post-SARS-CoV-2-infection (USA-WA1/2020, MOI = 0.3) in Vero E6 cells (kidney epithelial cells extracted from an African green monkey (GSE153940; [16]).	Vero E6 suggested: None
Differentially expressed genes from SARS-CoV-2 infection models and SARS-CoV-2-Human virus-host protein-protein interactome - Network analysis: For all conserved DEGs (genes differentially expressed unambiguously in at least 2 of the 3 transcriptomic sets compared, i.e., Calu-3, VeroE6, or mouse model - Ad5-hACE2) of …

SciScore for 10.1101/2020.08.27.270835: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Experimental Models: Cell Lines
Sentences	Resources
The second transcriptome signature is based on mRNA profiles of control (mock-infected) and 24h post-SARS-CoV-2-infection (USA-WA1/2020, MOI = 0.3) in Vero E6 cells (kidney epithelial cells extracted from an African green monkey (GSE153940; [16]).	Vero E6 suggested: None
Differentially expressed genes from SARS-CoV-2 infection models and SARS-CoV-2-Human virus-host protein-protein interactome - Network analysis: For all conserved DEGs (genes differentially expressed unambiguously in at least 2 of the 3 transcriptomic sets compared, i.e., Calu-3, VeroE6, or mouse model - Ad5-hACE2) of SARS-CoV-2 infection models and 336 SARS-CoV-2-human PPI genes, we extracted interactions with a score of ≥ 0.9 or experimental interaction score of 0.7 or more from STRING v11 [26].	Calu-3 suggested: KCLB Cat# 30055, RRID:CVCL_0609)
Software and Algorithms
Sentences	Resources
The raw data was downloaded from NCBI Sequence Read Archive (ProcessPublicData module) and the technical replicates were merged for individual samples before processing them (Process-RNASeq_SingleEnd module)	NCBI Sequence Read Archive suggested: (NCBI Sequence Read Archive (SRA, RRID:SCR_004891)
Quality checks [19] and quality trimming [20] were conducted prior to the transcript mapping/quantification step using the RSEM package [21].	RSEM suggested: (RSEM, RRID:SCR_013027)
RUVSeq [23] was used to remove potential variation and sequencing effects from the data before performing DE analysis using edgeR [24]	RUVSeq suggested: (RUVSeq, RRID:SCR_006263) edgeR suggested: (edgeR, RRID:SCR_012802)
For obtaining the human ortholog genes for mouse (Mus musculus) and the green monkey (Chlorocebus sabaeus), we used ortholog mappings from the NCBI’s HomoloGene and ENSEMBL databases.	HomoloGene suggested: (HomoloGene, RRID:SCR_002924) ENSEMBL suggested: (Ensembl, RRID:SCR_002344)
Differentially expressed genes from SARS-CoV-2 infection models and SARS-CoV-2-Human virus-host protein-protein interactome - Network analysis: For all conserved DEGs (genes differentially expressed unambiguously in at least 2 of the 3 transcriptomic sets compared, i.e., Calu-3, VeroE6, or mouse model - Ad5-hACE2) of SARS-CoV-2 infection models and 336 SARS-CoV-2-human PPI genes, we extracted interactions with a score of ≥ 0.9 or experimental interaction score of 0.7 or more from STRING v11 [26].	STRING suggested: (STRING, RRID:SCR_005223)
This DEG-PPI network was clustered (through Cytoscape v3.8.0 [27]) using Markov Clustering Algorithm (MCL) (available as part of the ClusterMaker Plugin [28] in Cytoscape) to identify gene clusters.	Cytoscape suggested: (Cytoscape, RRID:SCR_003032)
Functional enrichment analysis: Functional enrichment analysis was carried out on the various DEG sets and gene clusters (from MCL clustering) using the ToppFun application of the ToppGene suite [30] and Enrichr [31].	ToppGene suggested: ( ToppGene Suite , RRID:SCR_005726) Enrichr suggested: (Enrichr, RRID:SCR_001575)
We also generated aging human lung and liver DEG sets from the GTEx data using the BioJupies tool [41].	BioJupies suggested: (BioJupies, RRID:SCR_016346)
Additionally, we also used a curated set of 307 human aging genes from GenAge database [42] (Build February 2020) for enrichment analysis (Supplementary File 2).	GenAge suggested: (GenAge, RRID:SCR_010223)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

Our study holds certain limitations. The transcriptomics data used is from in vitro and in vivo model systems with no samples from human patients of COVID-19. Although there are emerging data sets from COVID-19 patients, they are currently limited. For example, from the study that reported SARS-CoV-2-triggered transcriptome in Calu-3 cells, transcriptomics data from lung samples of COVID-19 patient are available. However, these samples were from just one patient. Nevertheless, the conserved signature from our study can be used to compare with more robust transcriptomic signatures from COVID-19 patients as and when they are available. For example, comparing the DEGs from nasopharyngeal swabs from human patients (GSE152075) with the DEGs from the three SARS-CoV-2 infection models showed strong concordance with the Calu-3 and the humanized mouse model (Supplementary File 16). Surprisingly, there was no significant correlation between the VeroE6 upregulated genes and COVID-19 patient nasopharyngeal upregulated genes. Although Vero E6 is the most widely used cell line to replicate and isolate SARS-CoV-2 [4], the expression level of TMPRSS2, the receptor that SARS-CoV-2 uses to prime the spike protein of SARS-CoV-2 [80, 81] is reported to be quite low in this clone. Additionally, there were many DEGs (1180 upregulated and 1734 downregulated) that are specific to human patients suggesting the inherent limitations of current in vitro and in vivo models of COVID-19. The STRING-based P...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on pages 48, 51 and 52. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Secondary analysis of transcriptomes of SARS-CoV-2 infection models to characterize COVID-19

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Host transcriptional profiling identifies B cell associated genes to be upregulated in individuals with asymptomatic COVID-19 and latent tuberculosis

Integrated Transcriptomic Analysis Reveals Distinct Immune Response Signatures and Prognostic Biomarkers in SARS-CoV-2 Infection

Persistent Immune Dysregulation during Long COVID is Manifested in Antibodies Targeting Envelope and Nucleocapsid Proteins

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Host transcriptional profiling identifies B cell associated genes to be upregulated in individuals with asymptomatic COVID-19 and latent tuberculosis

Integrated Transcriptomic Analysis Reveals Distinct Immune Response Signatures and Prognostic Biomarkers in SARS-CoV-2 Infection

Persistent Immune Dysregulation during Long COVID is Manifested in Antibodies Targeting Envelope and Nucleocapsid Proteins