Comparative analysis of codon usage patterns in SARS-CoV-2, its mutants and other respiratory viruses

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The Coronavirus disease 2019 (COVID-19) outbreak caused by Severe Acute Respiratory Syndrome Coronavirus 2 virus (SARS-CoV-2) poses a worldwide human health crisis, causing respiratory illness with a high mortality rate. To investigate the factors governing codon usage bias in all the respiratory viruses, including SARS-CoV-2 isolates from different geographical locations (~62K), including two recently emerging strains from the United Kingdom (UK), i.e., VUI202012/01 and South Africa (SA), i.e., 501.Y.V2 codon usage bias (CUBs) analysis was performed. The analysis includes RSCU analysis, GC content calculation, ENC analysis, dinucleotide frequency and neutrality plot analysis. We were motivated to conduct the study to fulfil two primary aims: first, to identify the difference in codon usage bias amongst all SARS-CoV-2 genomes and, secondly, to compare their CUBs properties with other respiratory viruses. A biased nucleotide composition was found as most of the highly preferred codons were A/U-ending in all the respiratory viruses studied here. Compared with the human host, the RSCU analysis led to the identification of 11 over-represented codons and 9 under-represented codons in SARS-CoV-2 genomes. Correlation analysis of ENC and GC3s revealed that mutational pressure is the leading force determining the CUBs. The present study results yield a better understanding of codon usage preferences for SARS-CoV-2 genomes and discover the possible evolutionary determinants responsible for the biases found among the respiratory viruses, thus unveils a unique feature of the SARS-CoV-2 evolution and adaptation. To the best of our knowledge, this is the first attempt at comparative CUBs analysis on the worldwide genomes of SARS-CoV-2, including novel emerged strains and other respiratory viruses.

Article activity feed

  1. SciScore for 10.1101/2021.03.03.433699: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The complete genome sequences for different respiratory viruses including SARS-CoV-2 (NC_045512.2), SARS-CoV (NC_004718.3), Respiratory Syncytial
    SARS-CoV-2
    suggested: (Active Motif Cat# 91351, RRID:AB_2847848)
    The value of GC12 (mean GC content at 1st and 2nd position of the codon) and GC3 (GC content at the 3rd position of the codon) of different clades, along with all respiratory viruses, were calculated by using the EMBOSS CUSP program (http://emboss.sourceforge.net).
    EMBOSS
    suggested: (EMBOSS, RRID:SCR_008493)
    Phylogenetic analysis: A phylogenetic tree was constructed of complete genome sequences using MEGA 6.0 software [47] using the Neighbor-joining (NJ) algorithm and Kimura-2 parameter model on 1000 bootstrap replicates.
    MEGA
    suggested: (Mega BLAST, RRID:SCR_011920)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.