Mutation Rates and Selection on Synonymous Mutations in SARS-CoV-2

This article has been Reviewed by the following groups

Read the full article

Abstract

The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread. We highlight and address some common genome sequence analysis pitfalls that can lead to inaccurate inference of mutation rates and selection, such as ignoring skews in the genetic code, not accounting for recurrent mutations, and assuming evolutionary equilibrium. We find that two particular mutation rates, G →U and C →U, are similarly elevated and considerably higher than all other mutation rates, causing the majority of mutations in the SARS-CoV-2 genome, and are possibly the result of APOBEC and ROS activity. These mutations also tend to occur many times at the same genome positions along the global SARS-CoV-2 phylogeny (i.e., they are very homoplasic). We observe an effect of genomic context on mutation rates, but the effect of the context is overall limited. Although previous studies have suggested selection acting to decrease U content at synonymous sites, we bring forward evidence suggesting the opposite.

Article activity feed

  1. SciScore for 10.1101/2021.01.14.426705: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Second, a global alignment was created by aligning every sequence individually to the NC_045512.2 accession from NCBI, using MAFFT v 7.471 [52], faSplit (http://hgdownload.soe.ucsc.edu/admin/exe/), faSomeRecords (https://github.com/ENCODE-DCC/kentUtils), and GNU parallel [53].
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Second, the starting tree was optimised using FastTree 2 with 2 rounds of Subtree Pruning and Regrafting (SPR) using moves of length 1000 under a minimum evolution optimisation regime, and the tree was then further optimised using multiple rounds of Maximum Likelihood Nearest Neighbour Interchange (NNI) moves until no further improvement to the tree could be achieved using NNI.
    FastTree
    suggested: (FastTree, RRID:SCR_015501)

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    While we tried to account for possible biases as much as possible, our methods still have some limitations. First of all, our inference of mutation events is based on a prior phylogenetic inference, but tree inference from SARS-CoV-2 data is typically not very reliable, in part due to the low genetic diversity among sequences, but also due to homoplasic mutations [43, 23]. As mentioned above, our phylogenetic inference might also have been negatively affected by the choice of substitution models; however, currently, more realistic models like UNREST are either not implemented or are numerically unstable in sufficiently efficient phylogenetic packages such as [41, 44, 42]. In this study we tried not to rely excessively on individual inferences of mutation events, but rather focused on general patterns averaged over many sites and clades, which we think should provide robust inference despite the fact the inference of individual mutation events might not be reliable. However, a potential bias that might affect our result derives from the fact that some sites are very homoplasic, and our phylogenetic inference might lead to an over-parsimonious inference of their mutational history. This, in turn, might lead us to underestimate their mutation rate and overestimate their number of descendant tips per mutation events. In the future, a Bayesian phylogenetic approach might be useful to assess and possibly resolve this issue and assess its impact on our inference of selective pressur...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.