Genome-wide bioinformatic analyses predict key host and viral factors in SARS-CoV-2 pathogenesis

Mariana G. Ferrarini
Avantika Lal
Rita Rebollo
Andreas J. Gruber
Andrea Guarracino
Itziar Martinez Gonzalez
Taylor Floyd
Daniel Siqueira de Oliveira
Justin Shanklin
Ethan Beausoleil
Taneli Pusa
Brett E. Pickett
Vanessa Aguiar-Pulido

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

The novel betacoronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused a worldwide pandemic (COVID-19) after emerging in Wuhan, China. Here we analyzed public host and viral RNA sequencing data to better understand how SARS-CoV-2 interacts with human respiratory cells. We identified genes, isoforms and transposable element families that are specifically altered in SARS-CoV-2-infected respiratory cells. Well-known immunoregulatory genes including CSF2, IL32, IL-6 and SERPINA3 were differentially expressed, while immunoregulatory transposable element families were upregulated. We predicted conserved interactions between the SARS-CoV-2 genome and human RNA-binding proteins such as the heterogeneous nuclear ribonucleoprotein A1 (hnRNPA1) and eukaryotic initiation factor 4 (eIF4b). We also identified a viral sequence variant with a statistically significant skew associated with age of infection, that may contribute to intracellular host–pathogen interactions. These findings can help identify host mechanisms that can be targeted by prophylactics and/or therapeutics to reduce the severity of COVID-19.

Version published to 10.1038/s42003-021-02095-0
May 17, 2021
Version published to 10.21203/rs.3.rs-63136/v1 on Research Square
Aug 31, 2020

SciScore for 10.1101/2020.07.28.225581: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Experimental Models: Cell Lines
Sentences	Resources
The first dataset, GSE147507 [10], includes gene expression measurements from three cell lines derived from the human respiratory system (NHBE, A549, Calu-3) infected either with SARS-CoV-2, influenza A virus (IAV), respiratory syncytial virus (RSV), or human parainfluenza virus 3 (HPIV3).	A549 suggested: NCI-DTP Cat# A549, RRID:CVCL_0023)
Software and Algorithms
Sentences	Resources
Datasets: Two datasets were downloaded from the Gene Expression Omnibus (GEO) database, hosted at the National Center for Biotechnology Information (NCBI).	Gene Expression Omnibus suggested: (Gene Expression …

SciScore for 10.1101/2020.07.28.225581: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Experimental Models: Cell Lines
Sentences	Resources
The first dataset, GSE147507 [10], includes gene expression measurements from three cell lines derived from the human respiratory system (NHBE, A549, Calu-3) infected either with SARS-CoV-2, influenza A virus (IAV), respiratory syncytial virus (RSV), or human parainfluenza virus 3 (HPIV3).	A549 suggested: NCI-DTP Cat# A549, RRID:CVCL_0023)
Software and Algorithms
Sentences	Resources
Datasets: Two datasets were downloaded from the Gene Expression Omnibus (GEO) database, hosted at the National Center for Biotechnology Information (NCBI).	Gene Expression Omnibus suggested: (Gene Expression Omnibus (GEO, RRID:SCR_005012)
FastQC (v0.11.9; https://github.com/s-andrews/FastQC) and MultiQC (v1.9) [20] were employed to assess the quality of the data used and the need to trim reads and/or remove adapters.	FastQC suggested: (FastQC, RRID:SCR_014583) MultiQC suggested: (MultiQC, RRID:SCR_014982)
Selected datasets were mapped to the human reference genome (GENCODE Release 19, GRCh37.p13) utilizing STAR (v2.7.3a) [17].	STAR suggested: (STAR, RRID:SCR_015899)
Resulting SAM files were converted to BAM files employing samtools (v1.9) [43].	samtools suggested: (SAMTOOLS, RRID:SCR_002105)
Next, read quantification was performed using StringTie (v2.1.1) [60] and the output data was postprocessed with an auxiliary Python script provided by the same developers to produce files ready for subsequent downstream analyses.	Python suggested: (IPython, RRID:SCR_001658)
Finally, an exploratory data analysis was carried out based on the transformed values obtained after applying the variance stabilizing transformation [3] implemented in the vst() function of DESeq2 [48].	DESeq2 suggested: (DESeq, RRID:SCR_000154)
GO terms with a significant adjusted p-value of less than 0.05 were reduced to representative non-redundant terms with the use of REVIGO [73].	REVIGO suggested: (REViGO, RRID:SCR_005825)
The significant results for all comparisons from publicly available data from KEGG, Reactome, Panther, BioCarta, and NCI were then compiled to facilitate downstream comparison.	KEGG suggested: (KEGG, RRID:SCR_012773) Panther suggested: (PANTHER, RRID:SCR_004869) BioCarta suggested: (BioCarta Pathways, RRID:SCR_006917)
Hypergeometric pathway enrichments were performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID, v6.8) [30].	DAVID suggested: (DAVID, RRID:SCR_001881)
Isoform Analysis: Using transcript quantification data from StringTie as input, we identified isoform switching events and their predicted functional consequences with the IsoformSwitchAnalyzeR R package (v1.11.3) [81].	StringTie suggested: (StringTie , RRID:SCR_016323)
Following filtering for significant isoforms, we externally predicted their coding capabilities, protein structure stability, peptide signaling, and shifts in protein domain usage using The Coding-Potential Assessment Tool (CPAT) [82], IUPred2 [18], SignalP [2] and Pfam tools respectively [19].	SignalP suggested: (SignalP, RRID:SCR_015644) Pfam suggested: (Pfam, RRID:SCR_004726)
Viral genotype-phenotype correlation: All complete SARS-CoV-2 genomes from GISAID, together with the GenBank reference sequence, were aligned with MAFFT (v7.464) within a high-performance computing environment using 1 thread and the –nomemsave parameter [55]	MAFFT suggested: (MAFFT, RRID:SCR_011811)

Results from OddPub: Thank you for sharing your code.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

SciScore for 10.1101/2020.07.28.225581: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Experimental Models: Cell Lines
Sentences	Resources
From this list, we excluded all TE families detected in A549 cells infected with the other viruses.	A549 suggested: NCI-DTP Cat# A549, RRID:CVCL_0023
Software and Algorithms
Sentences	Resources
Materials and Methods Datasets Two datasets were downloaded from the Gene Expression Omnibus (GEO) database, hosted at the National Center for Biotechnology Information (NCBI).	Gene Expression Omnibus suggested: (Gene Expression Omnibus (GEO), RRID:SCR_005012)
FastQC (v0.11.9; https://github.com/s-andrews/FastQC) and MultiQC (v1.9) [20] were employed to assess the quality of the data and the need to trim reads and/or remove adapters.	FastQC suggested: (FastQC, RRID:SCR_014583) MultiQC suggested: (MultiQC, RRID:SCR_014982)
Selected datasets were mapped to the human reference genome (GENCODE Release 19, GRCh37.p13) utilizing STAR (v2.7.3a) [17].	STAR suggested: (STAR, RRID:SCR_015899)
Resulting SAM files were converted to BAM files employing samtools (v1.9) [43].	samtools suggested: (Samtools, RRID:SCR_002105)
Next, read quantification was performed using StringTie (v2.1.1) [60] and the output data was postprocessed with an auxiliary Python script provided by the same developers to produce files ready for subsequent downstream analyses.	StringTie suggested: (StringTie , RRID:SCR_016323) Python suggested: (IPython, RRID:SCR_001658)
GO terms with a significant adjusted p-value of less than 0.05 were reduced to representative non-redundant terms with the use of REVIGO [73].	REVIGO suggested: (REViGO, RRID:SCR_005825)
The significant results for all comparisons from publicly available data from KEGG, Reactome, Panther, BioCarta, and NCI were then compiled to facilitate downstream comparison.	KEGG suggested: (KEGG, RRID:SCR_012773) Panther suggested: (PANTHER, RRID:SCR_004869) BioCarta suggested: (BioCarta Pathways, RRID:SCR_006917)
Differentially expressed TEs (DETEs) in infected vs mock conditions were detected using DEseq2 with a matrix of counts for genes and TE families as input.	DEseq2 suggested: (DESeq2, RRID:SCR_015687)
Viral genotype-phenotype correlation All complete SARS-CoV-2 genomes from GISAID, together with the GenBank reference sequence, were aligned with MAFFT (v7.464) within a high-performance computing environment using 1 thread and the –nomemsave parameter [55]	MAFFT suggested: (MAFFT, RRID:SCR_011811)
Pathway enrichment for each dataset (SPIA and DAVID merged into one file).	DAVID suggested: (DAVID, RRID:SCR_001881)

Results from OddPub: Thank you for sharing your code.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

About SciScore

SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore is not a substitute for expert review. SciScore checks for the presence and correctness of RRIDs (research resource identifiers) in the manuscript, and detects sentences that appear to be missing RRIDs. SciScore also checks to make sure that rigor criteria are addressed by authors. It does this by detecting sentences that discuss criteria such as blinding or power analysis. SciScore does not guarantee that the rigor criteria that it detects are appropriate for the particular study. Instead it assists authors, editors, and reviewers by drawing attention to sections of the manuscript that contain or should contain various rigor criteria and key resources. For details on the results shown here, including references cited, please follow this link.

Read the original source

SciScore for 10.1101/2020.07.28.225581: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Experimental Models: Cell Lines
Sentences	Resources
From this list, we excluded all TE families detected in A549 cells infected with the other viruses.	A549 suggested: NCI-DTP Cat# A549, RRID:CVCL_0023
Software and Algorithms
Sentences	Resources
Materials and Methods Datasets Two datasets were downloaded from the Gene Expression Omnibus (GEO) database, hosted at the National Center for Biotechnology Information (NCBI).	Gene Expression Omnibus suggested: (Gene Expression Omnibus (GEO), RRID:SCR_005012)
FastQC (v0.11.9; https://github.com/s-andrews/FastQC) and MultiQC (v1.9) [20] were employed to assess the quality of the data and the need to trim reads and/or remove adapters.	FastQC suggested: (FastQC, RRID:SCR_014583) MultiQC suggested: (MultiQC, RRID:SCR_014982)
Selected datasets were mapped to the human reference genome (GENCODE Release 19, GRCh37.p13) utilizing STAR (v2.7.3a) [17].	STAR suggested: (STAR, RRID:SCR_015899)
Resulting SAM files were converted to BAM files employing samtools (v1.9) [43].	samtools suggested: (Samtools, RRID:SCR_002105)
Next, read quantification was performed using StringTie (v2.1.1) [60] and the output data was postprocessed with an auxiliary Python script provided by the same developers to produce files ready for subsequent downstream analyses.	StringTie suggested: (StringTie , RRID:SCR_016323) Python suggested: (IPython, RRID:SCR_001658)
DESeq2 (v1.26.0) [47] was used in both cases to identify differentially expressed genes (DEGs).	DESeq2 suggested: (DESeq, RRID:SCR_000154)
GO terms with a significant adjusted p-value of less than 0.05 were reduced to representative non-redundant terms with the use of REVIGO [73].	REVIGO suggested: (REViGO, RRID:SCR_005825)
The significant results for all comparisons from publicly available data from KEGG, Reactome, Panther, BioCarta, and NCI were then compiled to facilitate downstream comparison.	KEGG suggested: (KEGG, RRID:SCR_012773) Panther suggested: (PANTHER, RRID:SCR_004869) BioCarta suggested: (BioCarta Pathways, RRID:SCR_006917)
Hypergeometric pathway enrichments were performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID, v6.8) [30].	DAVID suggested: (DAVID, RRID:SCR_001881)
Viral genotype-phenotype correlation All complete SARS-CoV-2 genomes from GISAID, together with the GenBank reference sequence, were aligned with MAFFT (v7.464) within a high-performance computing environment using 1 thread and the –nomemsave parameter [55]	MAFFT suggested: (MAFFT, RRID:SCR_011811)

Results from OddPub: Thank you for sharing your code.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

About SciScore

Read the original source

SciScore for 10.1101/2020.07.28.225581: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Experimental Models: Cell Lines
Sentences	Resources
Here we report a subset of non-redundant reduced terms consistently 0.0 CRYM e than one SARS-COV-2 cell line which were not detected in the other viruses’ datasets.	SARS-COV-2 suggested: None
NHBE cells expressed 4 known IL-6 isoforms, while A549 cells expressed 1 unknown and 6 known isoforms.	A549 suggested: NCI-DTP Cat# A549, CVCL_0023
This allowed us to identify 16 families that were specifically upregulated in Calu-3 and A549 cells infected with SARS-CoV-2 and not in the other viral infections.	Calu-3 suggested: BCRJ Cat# 0264, CVCL_0609
Software and Algorithms
Sentences	Resources
Materials and Methods Datasets Two datasets were downloaded from the Gene Expression Omnibus (GEO) database, hosted at the National Center for Biotechnology Information (NCBI).	Gene Expression Omnibus suggested: (Gene Expression Omnibus (GEO), SCR_005012)
FastQC (v0.11.9; https://github.com/s-andrews/FastQC) and MultiQC (v1.9) [20] were employed to assess the quality of the data and the need to trim reads and/or remove adapters.	FastQC suggested: (FastQC, SCR_014583) <div style="margin-bottom:8px"> <div><b>MultiQC</b></div> <div>suggested: (MultiQC, <a href="https://scicrunch.org/resources/Any/search?q=SCR_014982">SCR_014982</a>)</div> </div> </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">Selected datasets were mapped to the human reference genome (GENCODE Release 19, GRCh37.p13) utilizing STAR (v2.7.3a) [17].</td><td style="min-width:100px;border-bottom:1px solid lightgray"> <div style="margin-bottom:8px"> <div><b>STAR</b></div> <div>suggested: (STAR, <a href="https://scicrunch.org/resources/Any/search?q=SCR_015899">SCR_015899</a>)</div> </div> </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">Resulting SAM files were converted to BAM files employing samtools (v1.9) [43].</td><td style="min-width:100px;border-bottom:1px solid lightgray"> <div style="margin-bottom:8px"> <div><b>samtools</b></div> <div>suggested: (Samtools, <a href="https://scicrunch.org/resources/Any/search?q=SCR_002105">SCR_002105</a>)</div> </div> </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">Next, read quantification was performed using StringTie (v2.1.1) [60] and the output data was postprocessed with an auxiliary Python script provided by the same developers to produce files ready for subsequent downstream analyses.</td><td style="min-width:100px;border-bottom:1px solid lightgray"> <div style="margin-bottom:8px"> <div><b>Python</b></div> <div>suggested: (IPython, <a href="https://scicrunch.org/resources/Any/search?q=SCR_001658">SCR_001658</a>)</div> </div> </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">Finally, an exploratory data analysis was carried out based on the transformed values obtained after applying the variance stabilizing transformation [3] implemented in the vst() function of DESeq2 [48].</td><td style="min-width:100px;border-bottom:1px solid lightgray"> <div style="margin-bottom:8px"> <div><b>DESeq2</b></div> <div>suggested: (DESeq, <a href="https://scicrunch.org/resources/Any/search?q=SCR_000154">SCR_000154</a>)</div> </div> </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">GO terms with a significant adjusted p-value of less than 0.05 were reduced to representative non-redundant terms with the use of REVIGO [73].</td><td style="min-width:100px;border-bottom:1px solid lightgray"> <div style="margin-bottom:8px"> <div><b>REVIGO</b></div> <div>suggested: (REViGO, <a href="https://scicrunch.org/resources/Any/search?q=SCR_005825">SCR_005825</a>)</div> </div> </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">The significant results for all comparisons from publicly available data from KEGG, Reactome, Panther, BioCarta, and NCI were then compiled to facilitate downstream comparison.</td><td style="min-width:100px;border-bottom:1px solid lightgray"> <div style="margin-bottom:8px"> <div><b>Panther</b></div> <div>suggested: (PANTHER, <a href="https://scicrunch.org/resources/Any/search?q=SCR_004869">SCR_004869</a>)</div> </div> <div style="margin-bottom:8px"> <div><b>BioCarta</b></div> <div>suggested: (BioCarta Pathways, <a href="https://scicrunch.org/resources/Any/search?q=SCR_006917">SCR_006917</a>)</div> </div> </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">Hypergeometric pathway enrichments were performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID, v6.8) [30].</td><td style="min-width:100px;border-bottom:1px solid lightgray"> <div style="margin-bottom:8px"> <div><b>DAVID</b></div> <div>suggested: (DAVID, <a href="https://scicrunch.org/resources/Any/search?q=SCR_001881">SCR_001881</a>)</div> </div> </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">Isoform Analysis Using transcript quantification data from StringTie as input, we identified isoform switching events and their predicted functional consequences with the IsoformSwitchAnalyzeR R package (v1.11.3) [79].</td><td style="min-width:100px;border-bottom:1px solid lightgray"> <div style="margin-bottom:8px"> <div><b>StringTie</b></div> <div>suggested: (StringTie , <a href="https://scicrunch.org/resources/Any/search?q=SCR_016323">SCR_016323</a>)</div> </div> </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">Following filtering for significant isoforms, we externally predicted their coding capabilities, protein structure stability, peptide signaling, and shifts in protein domain usage using The Coding-Potential Assessment Tool (CPAT) [80], IUPred2 [18], SignalP [2] and Pfam tools respectively [19].</td><td style="min-width:100px;border-bottom:1px solid lightgray"> <div style="margin-bottom:8px"> <div><b>SignalP</b></div> <div>suggested: (SignalP, <a href="https://scicrunch.org/resources/Any/search?q=SCR_015644">SCR_015644</a>)</div> </div> <div style="margin-bottom:8px"> <div><b>Pfam</b></div> <div>suggested: (Pfam, <a href="https://scicrunch.org/resources/Any/search?q=SCR_004726">SCR_004726</a>)</div> </div> </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">Viral genotype-phenotype correlation All complete SARS-CoV-2 genomes from GISAID, together with the GenBank reference sequence, were aligned with MAFFT (v7.464) within a high-performance computing environment using 1 thread and the –nomemsave parameter [55]</td><td style="min-width:100px;border-bottom:1px solid lightgray"> <div style="margin-bottom:8px"> <div><b>MAFFT</b></div> <div>suggested: (MAFFT, <a href="https://scicrunch.org/resources/Any/search?q=SCR_011811">SCR_011811</a>)</div> </div> </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">Interestingly, we were able to detect enriched KEGG pathways common to at least two SARS-CoV-2 infected cell types and absent from the other virus-infected datasets (Figure 2, Supplementary Table 2B).</td><td style="min-width:100px;border-bottom:1px solid lightgray"> <div style="margin-bottom:8px"> <div><b>KEGG</b></div> <div>suggested: (KEGG, <a href="https://scicrunch.org/resources/Any/search?q=SCR_012773">SCR_012773</a>)</div> </div> </td></tr></table> Data from additional tools added to each annotation on a weekly basis. About SciScore SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore is not a substitute for expert review. SciScore checks for the presence and correctness of RRIDs (research resource identifiers) in the manuscript, and detects sentences that appear to be missing RRIDs. SciScore also checks to make sure that rigor criteria are addressed by authors. It does this by detecting sentences that discuss criteria such as blinding or power analysis. SciScore does not guarantee that the rigor criteria that it detects are appropriate for the particular study. Instead it assists authors, editors, and reviewers by drawing attention to sections of the manuscript that contain or should contain various rigor criteria and key resources. For details on the results shown here, including references cited, please follow this link.

Read the original source

Version published to 10.1101/2020.07.28.225581 on bioRxiv
Jul 29, 2020

SARS-CoV-2 Nsp2 reprograms host immunity to drive pathogenic inflammation

This article has 11 authors:
1. Émile Lacasse
2. Isabelle Dubuc
3. Joannie Leclerc
4. Annie Gravel
5. Leslie Gudimard
6. Charles Joly Beauparlant
7. Marion Faure
8. Patrick Fortin
9. Marie-Renée Blanchet
10. Arnaud Droit
11. Louis Flamand
This article has no evaluationsLatest version May 7, 2026
Distinct virus-derived circular RNA molecule influences host response during SARS-CoV-2 infection

This article has 8 authors:
1. Elysse N. Grossi-Soyster
2. Rebekah C. Gullberg
3. Arjun Rustagi
4. Jae Seung Lee
5. Catherine A. Blish
6. Sara Cherry
7. Julia Salzman
8. Peter Sarnow
This article has no evaluationsLatest version Apr 28, 2026
Genome-wide computational prediction of miRNAs encoded by influenza A virus (H3N2) predicts target genes involved in pulmonary and antiviral innate immunity

This article has 3 authors:
1. Maria Ajmal Siddiqi
2. Harsh Kumar
3. Mohit Mazumder
This article has no evaluationsLatest version May 18, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

SARS-CoV-2 Nsp2 reprograms host immunity to drive pathogenic inflammation

Distinct virus-derived circular RNA molecule influences host response during SARS-CoV-2 infection

Genome-wide computational prediction of miRNAs encoded by influenza A virus (H3N2) predicts target genes involved in pulmonary and antiviral innate immunity