Evolution and variation of 2019-novel coronavirus

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

The current outbreak caused by novel coronavirus (2019-nCoV) in China has become a worldwide concern. As of 28 January 2020, there were 4631 confirmed cases and 106 deaths, and 11 countries or regions were affected.

Methods

We downloaded the genomes of 2019-nCoVs and similar isolates from the Global Initiative on Sharing Avian Influenza Database (GISAID and nucleotide database of the National Center for Biotechnology Information (NCBI). Lasergene 7.0 and MEGA 6.0 softwares were used to calculate genetic distances of the sequences, to construct phylogenetic trees, and to align amino acid sequences. Bayesian coalescent phylogenetic analysis, implemented in the BEAST software package, was used to calculate the molecular clock related characteristics such as the nucleotide substitution rate and the most recent common ancestor (tMRCA) of 2019-nCoVs.

Results

An isolate numbered EPI_ISL_403928 showed different phylogenetic trees and genetic distances of the whole length genome, the coding sequences (CDS) of ployprotein (P), spike protein (S), and nucleoprotein (N) from other 2019-nCoVs. There are 22, 4, 2 variations in P, S, and N at the level of amino acid residues. The nucleotide substitution rates from high to low are 1·05 × 10 −2 (nucleotide substitutions/site/year, with 95% HPD interval being 6.27 × 10 −4 to 2.72 × 10 −2 ) for N, 5.34 × 10 −3 (5.10 × 10 −4 , 1.28 × 10 −2 ) for S, 1.69 × 10 −3 (3.94 × 10 −4 , 3.60 × 10 −3 ) for P, 1.65 × 10 −3 (4.47 × 10 −4 , 3.24 × 10 −3 ) for the whole genome, respectively. At this nucleotide substitution rate, the most recent common ancestor (tMRCA) of 2019-nCoVs appeared about 0.253-0.594 year before the epidemic.

Conclusion

Our analysis suggests that at least two different viral strains of 2019-nCoV are involved in this outbreak that might occur a few months earlier before it was officially reported.

Article activity feed

  1. SciScore for 10.1101/2020.01.30.926477: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The former is a reference sequence (RefSeq) of pathogens that have caused major epidemics [10], while the latter two are the wild strains with the highest identity with 2019-nCoVs [11].
    RefSeq
    suggested: (RefSeq, RRID:SCR_003496)
    Pairwise distances were calculated by using the DNADIST program of the PHYLIP package and by using the Kimura two-parameter model of nucleotide substitution.
    PHYLIP
    suggested: (PHYLIP, RRID:SCR_006244)
    All phylogenetic trees were inferred by using MEGA6 software [13].
    MEGA6
    suggested: (MEGA Software, RRID:SCR_000667)
    Computation of mean evolutionary rate and the most recent common ancestor (tMRCA): Bayesian coalescent phylogenetic analysis, implemented in the BEAST v1.6.1 (http://beast.bio.ed.ac.uk) software package, was used to determine the molecular evolutionary rate [15].
    BEAST
    suggested: (BEAST, RRID:SCR_010228)
    Model selection was based on an analysis of marginal likelihoods, calculated in Tracer version 1.5.
    Tracer
    suggested: (Tracer, RRID:SCR_019121)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.