Infection Groups Differential (IGD) Score Reveals Infection Ability Difference between SARS-CoV-2 and Other Coronaviruses

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

The Corona Virus Disease 2019 (COVID-19) pandemic that began in late December 2019 has resulted in millions of cases diagnosed worldwide. Reports have shown that SARS-CoV-2 shows extremely higher infection rates than other coronaviruses. This study conducted a phylogenetics analysis of 91 representative coronaviruses and found that the functional spike protein of SARS-CoV-2, which interacts with the human receptor ACE2, is actually not undergoing distinct selection pressure compared to other coronaviruses. Furthermore, we define a new measurement, infection group differential (IGD) score, in assessing the infection ability of two human coronavirus groups. There are nine extremely high IGD (ehIGD) sites in the receptor-binding domain (RBD) out of 40 high IGD (hIGD) sites that exhibit a unique infection-related pattern from the haplotype network and docking energy comparison. These 40 hIGD sites are basically conserved among the SARS-CoV-2, i.e. there are only two hIGD sites mutated in four out of 1,058 samples, defined as rare-mutation hIGD (rhIGD) sites. In conclusion, ehIGD and rhIGD sites might be of great significance to the development of vaccines.

Article activity feed

  1. SciScore for 10.1101/2020.05.12.090324: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Homologous sequence searching of SARS-like coronaviruses spike protein was first performed by BLASTP (v 2.2.29+) [14] in NR (Non-Redundant Protein Sequence Database), and 868 sequences were selected under the condition of sequence length > 600 aa and a sequence identity > 30%.
    BLASTP
    suggested: (BLASTP, RRID:SCR_001010)
    Phylogenetic analysis: The DNA sequences of whole genome and related protein sequences of 91 coronaviruses were aligned using mafft v7.455 [15], and the result multiple sequence was trimmed for poorly aligned positions with Gblock 0.91b [16].
    mafft
    suggested: (MAFFT, RRID:SCR_011811)
    RAxML v8.2.12 [17] was used to build the maximum likelihood phylogenetic tree of genomes with the parameters “-m GTRCAT” and protein with the parameters “-m PROTGAMMAILGX”.
    RAxML
    suggested: (RAxML, RRID:SCR_006086)
    Codon usage bias analysis: Condonw v1.3 [19] was used to calculate the universal index value of each codon of the Cds of coronavirus functional protein.
    Condonw
    suggested: None
    Ka/Ks analysis: Ka/KS ratios were calculated using KaKs_Calculator 2.0 [20] and used in the analysis of selection pressure.
    KaKs_Calculator
    suggested: None
    Haplotype network analysis: DnaSP v6.12.03 [21] was used to generate multi-sequence aligned haplotype data.
    DnaSP
    suggested: (DnaSP, RRID:SCR_003067)
    Arlequin v3.5.2.2 [22] was used to estimate haplotype frequency.
    Arlequin
    suggested: (ARLEQUIN, RRID:SCR_009051)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.