Using nucleocapsid proteins to investigate the relationship between SARS-CoV-2 and closely related bat and pangolin coronaviruses

This article has been Reviewed by the following groups

Read the full article

Abstract

An initial outbreak of coronavirus disease 2019 (COVID-19) in China has resulted in a massive global pandemic causing well over 16,500,000 cases and 650,000 deaths worldwide. The virus responsible, SARS-CoV-2, has been found to possess a very close association with Bat-CoV RaTG13 and Pangolin-CoV MP789. The nucleocapsid protein can serve as a decent model for determining phylogenetic, evolutionary, and structural relationships between coronaviruses. Therefore, this study uses the nucleocapsid gene and protein to further investigate the relationship between SARS-CoV-2 and closely related bat and pangolin coronaviruses. Sequence and phylogenetic analyses have revealed the nucleocapsid gene and protein in SARS-CoV-2 are both closely related to those found in Bat-CoV RaTG13 and Pangolin-CoV MP789. Evidence of recombination was detected within the N gene, along with the presence of a double amino acid insertion found in the N-terminal region. Homology modeling for the N-Terminal Domain revealed similar structures but distinct electrostatic surfaces and topological variations in the β-hairpin that likely reflect specific adaptive functions. In respect to SARS-CoV-2, two amino acids (S37 and A267) were found to exist only in its N protein, along with an extended β-hairpin that bends towards the nucleotide binding site. Collectively, this study strengthens the relationship among SARS-CoV-2, Bat-CoV RaTG13, and Pangolin-CoV MP789, providing additional insights into the structure and adaptive nature of the nucleocapsid protein found in these coronaviruses. Furthermore, these data will enhance our understanding of the complete history behind SARS-CoV-2 and help assist in antiviral and vaccine development.

Article activity feed

  1. SciScore for 10.1101/2020.06.25.172312: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Sequence Acquisition and Identity Analysis: Nucleocapsid gene and protein sequences in SARS-CoV-2, pangolin CoVs, and bat CoVs were retrieved from the NCBI RefSeq, Nucleotide, and Protein databases.
    RefSeq
    suggested: (RefSeq, RRID:SCR_003496)
    Phylogenetic and Recombination Analysis: Sequences were aligned using MUSCLE on the MEGA-X v10.1.7 software.33,34 The default alignment settings were retained, except clustering methods which were both changed to neighbor joining.
    MEGA-X
    suggested: None
    Using those sequences, nucleotide and amino acid identities for the NTD and CTD were calculated by MatGAT v2.01 to evaluate the identity shared among each domain.
    MatGAT
    suggested: None
    Gene sequences were aligned using MUSCLE, after which the alignment was used by SimPlot v3.5.1 to generate a nucleotide similarity plot in detecting for potential recombination events.
    MUSCLE
    suggested: (MUSCLE, RRID:SCR_011812)
    SimPlot
    suggested: None
    This method does not guarantee phosphorylation occurs but provides scoring for the likelihood that phosphorylation would occur for any given serine, threonine, and/or tyrosine residue within a protein. 4. Homology Modeling: Homology models were built using the NTD in SARS-CoV (PDB: 2OFZ) and carried out using the MODELLER v9.23 software.42,43 Ten models were built for each sequence and the model presenting with the lowest discrete optimized protein energy (DOPE) was chosen.
    MODELLER
    suggested: (MODELLER, RRID:SCR_008395)
    Then, charges were assigned using the parameters designated for standard residues in the AMBER ff14SB option.
    AMBER
    suggested: (AMBER, RRID:SCR_016151)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on page 23. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.