Analysis of the Dynamics and Distribution of SARS-CoV-2 Mutations and its Possible Structural and Functional Implications

This article has been Reviewed by the following groups

Read the full article

Abstract

After eight months of the pandemic declaration, COVID-19 has not been globally controlled. Several efforts to control SARS-CoV-2 dissemination are still running including vaccines and drug treatments. The effectiveness of these procedures depends, in part, that the regions to which these treatments are directed do not vary considerably. Although, it is known that the mutation rate of SARS-CoV-2 is relatively low it is necessary to monitor the adaptation and evolution of the virus in the different stages of the pandemic. Thus, identification, analysis of the dynamics, and possible functional and structural implication of mutations are relevant. Here, we first estimate the number of COVID-19 cases with a virus with a specific mutation and then calculate its global relative frequency (NRFp). Using this approach in a dataset of 100 924 genomes from GISAID, we identified 41 mutations to be present in viruses in an estimated number of 750 000 global COVID-19 cases (0.03 NRFp). We classified these mutations into three groups: high-frequent, low-frequent non-synonymous, and low-frequent synonymous. Analysis of the dynamics of these mutations by month and continent showed that high-frequent mutations appeared early in the pandemic, all are present in all continents and some of them are almost fixed in the global population. On the other hand, low-frequent mutations (non-synonymous and synonymous) appear late in the pandemic and seems to be at least partially continent-specific. This could be due to that high-frequent mutation appeared early when lockdown policies had not yet been applied and low-frequent mutations appeared after lockdown policies. Thus, preventing global dissemination of them. Finally, we present a brief structural and functional review of the analyzed ORFs and the possible implications of the 25 identified non-synonymous mutations.

Article activity feed

  1. SciScore for 10.1101/2020.11.13.381228: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Finally, we bound the 8 alignments using cat function in Linux and use this to extract regions corresponding to each of the ORFs and nsp regions of SARS-CoV-2 (regions as annotated in the NCBI database of the Wuhan-Hu-1 reference genome).
    NCBI
    suggested: (NCBI, RRID:SCR_006472)
    After that, sequences were divided by continent-month combinations, aligned using MAFFT with FFT-NS-2 strategy and default parameter settings (Katoh et al. 2002), columns with more than 98 % gaps were removed and relative frequencies of each base or gap in each position were calculated (RFp,m−c).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    The number of cases of each country was obtained from the European Centre for Disease Prevention and Control: https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide.
    Control
    suggested: None
    Potential energy of mutational and wild-type models was minimized using Gromacs (v.2018.8) (Berendsen et al. 1995).
    Gromacs
    suggested: (GROMACS, RRID:SCR_014565)
    All structural images were produced using ChimeraX (v.1.1) (Pettersen et al. 2020) or Chimera (v.1.15) (Pettersen et al. 2004).
    ChimeraX
    suggested: (UCSF ChimeraX, RRID:SCR_015872)
    Chimera
    suggested: (Chimera, RRID:SCR_002959)
    2014) with the ggplot2 package (Wickham 2016).
    ggplot2
    suggested: (ggplot2, RRID:SCR_014601)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.