Comparative Genomic Analysis of Rapidly Evolving SARS-CoV-2 Reveals Mosaic Pattern of Phylogeographical Distribution

This article has been Reviewed by the following groups

Read the full article

Abstract

The COVID-19 pandemic continues to storm the world, with over 6.5 million cases worldwide. The severity of the disease varies with the territories and is mainly influenced by population density and age factor. In this study, we analyzed the transmission pattern of 95 SARS-CoV-2 genomes isolated from 11 different countries. Our study also revealed several nonsynonymous mutations in ORF1b and S-proteins and the impact on their structural stability. Our analysis showed the manipulation of host system by viral proteins through SARS-CoV-2–human protein interactome, which can be useful to understand the impact of virus on human health.

Article activity feed

  1. SciScore for 10.1101/2020.03.25.006213: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Selection of genomes and annotation: Sequences of different strains were downloaded from NCBI database https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/ (Table 1).
    NCBI
    suggested: (NCBI, RRID:SCR_006472)
    Further the genomes were annotated using Prokka [22].
    Prokka
    suggested: (Prokka, RRID:SCR_014732)
    Further the GC content information was generated using QUAST standalone tool [23].
    QUAST
    suggested: (QUAST, RRID:SCR_001228)
    The orthologous gene clusters were aligned using MUSCLE v3.8 [24] and further processed for removing stop codons using HyPhy v2.2.4 [25].
    MUSCLE
    suggested: (MUSCLE, RRID:SCR_011812)
    HyPhy
    suggested: (HyPhy, RRID:SCR_016162)
    Single-Likelihood Ancestor Counting (SLAC) method in Datamonkey v2.0 [26] (http://www.datamonkey.org/slac) was used to calculate dN/dS value for each orthologous gene cluster.
    Datamonkey
    suggested: (DataMonkey, RRID:SCR_010278)
    The dN/dS values were plotted in R (R Development Core Team, 2015).
    R Development Core
    suggested: (R Project for Statistical Computing, RRID:SCR_001905)
    Phylogenetic analysis: To infer the phylogeny, the core gene alignment was generated using MAFFT [27] present within the Roary Package [28].
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Since, none of the SARS-CoV-2 genomes are updated in any protein database, we first annotated the genes using BLASTp tool [34].
    BLASTp
    suggested: (BLASTP, RRID:SCR_001010)
    STRING v10.5 [36] and IntAct [37] for predicting their interaction against host proteins.
    IntAct
    suggested: (IntAct, RRID:SCR_006944)
    Functional enrichment analysis: Next, functional studies were performed using the Kyoto Encyclopedia of Genes and Genomes (KEGG) [41, 42] and Gene Ontology (GO) enrichment analyses using UniProt database [43] to evaluate the biological relevance and functional pathways of the HCoV-associated proteins.
    KEGG
    suggested: (KEGG, RRID:SCR_012773)
    UniProt
    suggested: (UniProtKB, RRID:SCR_004426)
    All functional analyses were performed using STRING enrichment and STRINGify, plugin of Cytoscape v.
    STRING
    suggested: (STRING, RRID:SCR_005223)
    Cytoscape
    suggested: (Cytoscape, RRID:SCR_003032)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.