Comparative Genomic Analyses Reveal a Specific Mutation Pattern Between Human Coronavirus SARS-CoV-2 and Bat-CoV RaTG13

This article has been Reviewed by the following groups

Read the full article

Abstract

The outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Wuhan, China, rapidly grew into a global pandemic. How SARS-CoV-2 evolved remains unclear.

Methods

We performed a comprehensive analysis using the available genomes of SARS-CoV-2 and its closely related coronaviruses.

Results

The ratio of nucleotide substitutions to amino acid substitutions of the spike gene (9.07) between SARS-CoV-2 WIV04 and Bat-CoV RaTG13 was markedly higher than that between other coronaviruses (range, 1.29–4.81); the ratio of non-synonymous to synonymous substitution rates (dN/dS) between SARS-CoV-2 WIV04 and Bat-CoV RaTG13 was the lowest among all the performed comparisons, suggesting evolution under stringent selective pressure. Notably, the relative proportion of the T:C transition was markedly higher between SARS-CoV-2 WIV04 and Bat-CoV RaTG13 than between other compared coronaviruses. Codon usage is similar across these coronaviruses and is unlikely to explain the increased number of synonymous mutations. Moreover, some sites of the spike protein might be subjected to positive selection.

Conclusions

Our results showed an increased proportion of synonymous substitutions and the T:C transition between SARS-CoV-2 and RaTG13. Further investigation of the mutation pattern mechanism would contribute to understanding viral pathogenicity and its adaptation to hosts.

Article activity feed

  1. SciScore for 10.1101/2020.02.27.969006: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Phylogenetic analysis: Genome sequences were aligned using MUSCLE v3.8.31 (9), followed by manual adjustment using BioEdit v7.2.5.
    MUSCLE
    suggested: (MUSCLE, RRID:SCR_011812)
    BioEdit
    suggested: (BioEdit, RRID:SCR_007361)
    Phylogenetic analyses of complete genome were performed using maximum-likelihood method and general time-reversible model of nucleotide substitution with gamma-distributed rates among sites (GTR+G) in RAxML v8.1.21 (10).
    RAxML
    suggested: (RAxML, RRID:SCR_006086)
    Phylogenetic analyses of coding sequences were performed using MEGA-X software.
    MEGA-X
    suggested: None
    Estimation of synonymous and non-synonymous substitution rates: The number of synonymous substitutions per synonymous site (dS), and the number of non-synonymous substitutions per non-synonymous site (dN), for each coding region were calculated using the Nei-Gojobori method (Jukes-Cantor) in PAML package (12).
    PAML
    suggested: (PAML, RRID:SCR_014932)
    The adaptive evolution server (http://www.datamonkey.org/) was used to identify eventual sites of positive selection.
    http://www.datamonkey.org/
    suggested: (DataMonkey, RRID:SCR_010278)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • No conflict of interest statement was detected. If there are no conflicts, we encourage authors to explicit state so.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.