Codon usage and evolutionary rates of the 2019-nCoV genes

This article has been Reviewed by the following groups

Read the full article

Abstract

Severe acute respiratory syndrome coronavirus 2 (2019-nCoV), which first broke out in Wuhan (China) in December of 2019, causes a severe acute respiratory illness with a mortality ranging from 3% to 6%. To better understand the evolution of the newly emerging 2019-nCoV, in this paper, we analyze the codon usage pattern of 2019-nCoV. For this purpose, we compare the codon usage of 2019-nCoV with that of other 30 viruses belonging to the subfamily of orthocoronavirinae. We found that 2019-nCoV shows a rich composition of AT nucleotides that strongly influences its codon usage, which appears to be not optimized to human host. Then, we study the evolutionary pressures influencing the codon usage and evolutionary rates of the sequences of five conserved genes that encode the corresponding proteins (viral replicase, spike, envelope, membrane and nucleocapsid) characteristic of coronaviruses. We found different patterns of both mutational bias and nature selection that affect the codon usage of these genes at different extents. Moreover, we show that the two integral membrane proteins proteins (matrix and envelope) tend to evolve slowly by accumulating nucleotide mutations on their genes. Conversely, genes encoding nucleocapsid (N), viral replicase and spike proteins are important targets for the development of vaccines and antiviral drugs, tend to evolve faster as compared to other ones. Taken together, our results suggest that the higher evolutionary rate observed for these two genes could represent a major barrier in the development of antiviral therapeutics 2019-nCoV.

Article activity feed

  1. SciScore for 10.1101/2020.03.25.006569: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Experimental Models: Organisms/Strains
    SentencesResources
    ModelTest-NG [6] was used to select the best-fit evolutionary model of nucleotide substitution, that is, GTR + G + I.
    ModelTest-NG
    suggested: None
    Software and Algorithms
    SentencesResources
    The complete coding genomic sequences of 30 coronaviruses were downloaded from the National Center for Biotechnological Information (NCBI) (available at https://www.ncbi.nlm.nih.gov/).
    https://www.ncbi.nlm.nih.gov/
    suggested: (GENSAT at NCBI - Gene Expression Nervous System Atlas, RRID:SCR_003923)
    To calculate these values we use an in-house Python script.
    Python
    suggested: (IPython, RRID:SCR_001658)
    In this case, the three stop codons (TAA, TAG, or TGA) and the three codons for isoleucine (ATT, ATC, and ATA) were excluded in calculation of GC3, and two single codons for methionine (ATG) and tryptophan (TGG) were excluded in all three (GC1, GC2, GC3) (Sueoka 1988).
    ATC
    suggested: None
    The protein sequences were aligned using Biopython.
    Biopython
    suggested: (Biopython, RRID:SCR_007173)
    The resulting multiple sequence alignment was used to build a phylogenetic tree by employing a maximum-likelihood (ML) method implemented in the software package MEGA version 10. 1 [15].
    MEGA
    suggested: (Mega BLAST, RRID:SCR_011920)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.