Comparative Genomics and Integrated Network Approach Unveiled Undirected Phylogeny Patterns, Co-mutational Hotspots, Functional Crosstalk and Regulatory Interactions in SARS-CoV-2

This article has been Reviewed by the following groups

Read the full article

Abstract

SARS-CoV-2 pandemic resulted in 92 million cases in a span of one year. The study focuses on understanding population specific variations attributing its high rate of infections in specific geographical regions particularly in USA. Rigorous phylogenomic network analysis of complete SARS-CoV-2 genomes (245) inferred five central clades named a (ancestral), b, c, d and e (subtype e1 & e2). The clade d & e2 were found exclusively comprising of USA. Clades were distinguished by 10 co-mutational combinations in Nsp3, ORF8, Nsp13, S, Nsp12, Nsp2 and Nsp6. Our analysis revealed that only 67.46% of SNP mutations were at amino acid level. T1103P mutation in Nsp3 was predicted to increase protein stability in 238 strains except 6 strains which were marked as ancestral type; whereas co-mutation (P409L & Y446C) in Nsp13 were found in 64 genomes from USA highlighting its 100% co-occurrence. Docking highlighted mutation (D614G) caused reduction in binding of Spike proteins with ACE2, but it also showed better interaction with TMPRSS2 receptor contributing to high transmissibility among USA strains. We also found host proteins, MYO5A, MYO5B, MYO5C had maximum interaction with viral proteins (N, S, M). Thus, blocking the internalization pathway by inhibiting MYO5 proteins which could be an effective target for COVID-19 treatment. The functional annotations of the HPI network were found to be closely associated with hypoxia and thrombotic conditions confirming the vulnerability and severity of infection. We also screened CpG islands in Nsp1 & N conferring ability of SARS-CoV-2 to enter and trigger ZAP activity inside host cell.

Importance

In the current study we presented a global view of mutational pattern observed in SARS-CoV-2 virus transmission. This provided a who-infect-whom geographical model since the early pandemic. This is hitherto the most comprehensive comparative genomics analysis of full-length genomes for co-mutations at different geographical regions specially in USA strains. Compositional structural biology results suggested that mutations have balance of contrary forces effect on pathogenicity suggesting only few mutations to effective at translation level but not all. Novel HPI analysis and CpG predictions elucidates the proof of concept of hypoxia and thrombotic conditions in several patients. Thus, the current study focuses the understanding of population specific variations attributing high rate of SARS-CoV-2 infections in specific geographical regions which may eventually be vital for the most severely affected countries and regions for sharp development of custom-made vindication strategies.

Article activity feed

  1. SciScore for 10.1101/2020.06.20.162560: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The alignments so obtained were processed for phylogeny construction using BioEdit software (18).
    BioEdit
    suggested: (BioEdit, RRID:SCR_007361)
    Data and Computer programs: The genomic analytics is performed using programs in Python and Biopython libraries (22).
    Python
    suggested: (IPython, RRID:SCR_001658)
    Biopython
    suggested: (Biopython, RRID:SCR_007173)
    To find the Host Pathogen Interaction (HPI), we subjected SARS-CoV-2 proteins sequence to Host-Pathogen interaction databases such as Viruses STRING v10.5 (24) and HPIDB3.0 (25) to predict their direct interaction with human as the principal host.
    STRING
    suggested: (STRING, RRID:SCR_005223)
    In these databases, the virus–host interaction was imported from different PPI databases like MintAct (26), IntAct (26), HPIDB (25) and VirusMentha (27).
    IntAct
    suggested: (IntAct, RRID:SCR_006944)
    For high-throughput analysis, it searches multiple protein sequences at a time using BLASTp and obtain results in tabular and sequence alignment formats (28).
    BLASTp
    suggested: (BLASTP, RRID:SCR_001010)
    , plugin of Cytoscape v3.7.2, we identified the hub protein.
    Cytoscape
    suggested: (Cytoscape, RRID:SCR_003032)
    Gene ontology (GO) analysis was performed using ClueGo (31), selecting the Kyoto Encyclopedia of Genes and Genomes (KEGG) (32)
    ClueGo
    suggested: (ClueGO, RRID:SCR_005748)
    KEGG
    suggested: (KEGG, RRID:SCR_012773)
    , Gene Ontology—biological function database, and Reactome Pathways (33) databases.
    Gene Ontology—biological
    suggested: None
    Computational structural analysis on wild-type and mutant SARS-CoV-2 proteins: SARS-CoV-2 proteins sequences were retrieved from the NCBI genome database and pairwise sequence alignment of wild-type and mutant proteins were carried out by the Clustal Omega tool (34).
    Clustal Omega
    suggested: (Clustal Omega, RRID:SCR_001591)
    The docking studies for wild and mutant SARS-CoV-2 proteins with host proteins was carried out using PatchDock Server (40)
    PatchDock
    suggested: (PatchDock, RRID:SCR_017589)
    The presence of common CpG islands was confirmed by performing BLAST using the above reference strain.
    BLAST
    suggested: (BLASTX, RRID:SCR_001653)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.