Comparative Genomic Analysis of Rapidly Evolving SARS-CoV-2 Viruses Reveal Mosaic Pattern of Phylogeographical Distribution

Abstract

The Coronavirus Disease-2019 (COVID-19) that started in Wuhan, China in December 2019 has spread worldwide emerging as a global pandemic. The severe respiratory pneumonia caused by the novel SARS-CoV-2 has so far claimed more than 60,000 lives and has impacted human lives worldwide. However, as the novel SARS-CoV-2 displays high transmission rates, their underlying genomic severity is required to be fully understood. We studied the complete genomes of 95 SARS-CoV-2 strains from different geographical regions worldwide to uncover the pattern of the spread of the virus. We show that there is no direct transmission pattern of the virus among neighboring countries suggesting that the outbreak is a result of travel of infected humans to different countries. We revealed unique single nucleotide polymorphisms (SNPs) in nsp13-16 (ORF1b polyprotein) and S-Protein within 10 viral isolates from the USA. These viral proteins are involved in RNA replication, indicating highly evolved viral strains circulating in the population of USA than other countries. Furthermore, we found an amino acid addition in nsp16 (mRNA cap-1 methyltransferase) of the USA isolate (MT188341) leading to shift in amino acid frame from position 2540 onwards. Through the construction of SARS-CoV-2-human interactome, we further revealed that multiple host proteins (PHB, PPP1CA, TGF-β, SOCS3, STAT3, JAK1/2, SMAD3, BCL2, CAV1 & SPECC1) are manipulated by the viral proteins (nsp2, PL-PRO, N-protein, ORF7a, M-S-ORF3a complex, nsp7-nsp8-nsp9-RdRp complex) for mediating host immune evasion. Thus, the replicative machinery of SARS-CoV-2 is fast evolving to evade host challenges which need to be considered for developing effective treatment strategies.

SciScore for 10.1101/2020.03.25.006213: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Selection of genomes and annotation: Sequences of different strains were downloaded from NCBI database https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/ (Table 1).	NCBI suggested: (NCBI, RRID:SCR_006472)
Further the genomes were annotated using Prokka [22].	Prokka suggested: (Prokka, RRID:SCR_014732)
Further the GC content information was generated using QUAST standalone tool [23].	QUAST suggested: (QUAST, RRID:SCR_001228)
The orthologous gene clusters were aligned using MUSCLE v3.8 [24] and further processed for removing stop codons using HyPhy v2.2.4 [25].	MUSCLE suggested: (MUSCLE, RRID:SCR_01…

SciScore for 10.1101/2020.03.25.006213: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Selection of genomes and annotation: Sequences of different strains were downloaded from NCBI database https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/ (Table 1).	NCBI suggested: (NCBI, RRID:SCR_006472)
Further the genomes were annotated using Prokka [22].	Prokka suggested: (Prokka, RRID:SCR_014732)
Further the GC content information was generated using QUAST standalone tool [23].	QUAST suggested: (QUAST, RRID:SCR_001228)
The orthologous gene clusters were aligned using MUSCLE v3.8 [24] and further processed for removing stop codons using HyPhy v2.2.4 [25].	MUSCLE suggested: (MUSCLE, RRID:SCR_011812) HyPhy suggested: (HyPhy, RRID:SCR_016162)
Single-Likelihood Ancestor Counting (SLAC) method in Datamonkey v2.0 [26] (http://www.datamonkey.org/slac) was used to calculate dN/dS value for each orthologous gene cluster.	Datamonkey suggested: (DataMonkey, RRID:SCR_010278)
The dN/dS values were plotted in R (R Development Core Team, 2015).	R Development Core suggested: (R Project for Statistical Computing, RRID:SCR_001905)
Phylogenetic analysis: To infer the phylogeny, the core gene alignment was generated using MAFFT [27] present within the Roary Package [28].	MAFFT suggested: (MAFFT, RRID:SCR_011811)
Since, none of the SARS-CoV-2 genomes are updated in any protein database, we first annotated the genes using BLASTp tool [34].	BLASTp suggested: (BLASTP, RRID:SCR_001010)
STRING v10.5 [36] and IntAct [37] for predicting their interaction against host proteins.	IntAct suggested: (IntAct, RRID:SCR_006944)
Functional enrichment analysis: Next, functional studies were performed using the Kyoto Encyclopedia of Genes and Genomes (KEGG) [41, 42] and Gene Ontology (GO) enrichment analyses using UniProt database [43] to evaluate the biological relevance and functional pathways of the HCoV-associated proteins.	KEGG suggested: (KEGG, RRID:SCR_012773) UniProt suggested: (UniProtKB, RRID:SCR_004426)
All functional analyses were performed using STRING enrichment and STRINGify, plugin of Cytoscape v.	STRING suggested: (STRING, RRID:SCR_005223) Cytoscape suggested: (Cytoscape, RRID:SCR_003032)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
No funding statement was detected.
No protocol registration statement was detected.

Read the original source

Comparative Genomic Analysis of Rapidly Evolving SARS-CoV-2 Viruses Reveal Mosaic Pattern of Phylogeographical Distribution

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Dengue Virus Type 2: Global Epidemiology, Molecular Evolution, and Immune Response Insights

DIVERSITY AND CLINICAL CORRELATIONS OF SARS-CoV-2 VARIANT DURING THE INTRODUCTION OF THE DELTA VARIANT IN GUATEMALA

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Dengue Virus Type 2: Global Epidemiology, Molecular Evolution, and Immune Response Insights

DIVERSITY AND CLINICAL CORRELATIONS OF SARS-CoV-2 VARIANT DURING THE INTRODUCTION OF THE DELTA VARIANT IN GUATEMALA

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.