Temporal evolution and adaptation of SARS-CoV-2 codon usage

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) first occurred in Wuhan (China) in December of 2019. Since the outbreak, it has accumulated mutations on its coding sequences to optimize its adaptation to the human host. The identification of its genetic variants has become crucial in tracking and evaluating their spread across the globe. Methods: In this study, we compared 320,338 SARS-CoV-2 genomes isolated from all over the world to the first sequenced genome in Wuhan, China. To this end, we analysed over time the codon usage patterns of SARS-CoV-2 genes encoding for the membrane protein (M), envelope (E), spike surface glycoprotein (S), nucleoprotein (N), RNA-dependent RNA polymerase (RdRp) and ORF1ab. Results: We found that genes coding for the proteins N and S diverged more rapidly since the outbreak by accumulating mutations. Interestingly, all genes show a deoptimization of their codon usage with respect to the human host. Our findings suggest a general evolutionary trend of SARS-CoV-2, which evolves towards a sub-optimal codon usage bias to favour the host survival and its spread. Furthermore, we found that S protein and RdRp are more subject to an increasing purifying pressure over time, which implies that these proteins will reach a lower tendency to accept mutations. In contrast, proteins N and M tend to evolve more under the action of mutational bias, thus exploring a large region of their sequence space. Conclusions: Overall, our study shed more light on the evolution of SARS-CoV-2 genes and their adaptation to humans, helping to foresee their mutation patterns and the emergence of new variants.

Article activity feed

  1. SciScore for 10.1101/2020.05.29.123976: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Experimental Models: Organisms/Strains
    SentencesResources
    We downloaded these PPI from NDEx (https://public.ndexbio.org/network/43803262 − 6d69 − 11ea − bfdc − 0ac135e8bacf).
    https://public.ndexbio.org/network/43803262 − 6d69 − 11ea − bfdc − 0ac135e8bacf
    suggested: None
    Software and Algorithms
    SentencesResources
    2.2 Relative Synonymous Codon Usage: RSCU vectors for all the genomes were computed by using an in-house Python script, following the formula:

    In the RSCUi Xi is the number of occurrences, in a given genome, of codon i, and the sum in the denominator runs over its ni synonymous codons.

    Python
    suggested: (IPython, RRID:SCR_001658)
    We then showed the average values of the distance over time with a heatmap, drawn with MATLAB.
    MATLAB
    suggested: (MATLAB, RRID:SCR_001622)
    The protein sequences were aligned using Biopython.
    Biopython
    suggested: (Biopython, RRID:SCR_007173)
    To detect communities of PPI, we used the application Molecular Complex Detection (MCODE) [32] in Cytoscape (https://cytoscape.org/).
    Cytoscape
    suggested: (Cytoscape, RRID:SCR_003032)
    suggested: (CluePedia Cytoscape plugin, RRID:SCR_015784)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.