No evidence for increased transmissibility from recurrent mutations in SARS-CoV-2

This article has been Reviewed by the following groups

Read the full article

Abstract

COVID-19 is caused by the coronavirus SARS-CoV-2, which jumped into the human population in late 2019 from a currently uncharacterised animal reservoir. Due to this recent association with humans, SARS-CoV-2 may not yet be fully adapted to its human host. This has led to speculations that SARS-CoV-2 may be evolving towards higher transmissibility. The most plausible mutations under putative natural selection are those which have emerged repeatedly and independently (homoplasies). Here, we formally test whether any homoplasies observed in SARS-CoV-2 to date are significantly associated with increased viral transmission. To do so, we develop a phylogenetic index to quantify the relative number of descendants in sister clades with and without a specific allele. We apply this index to a curated set of recurrent mutations identified within a dataset of 46,723 SARS-CoV-2 genomes isolated from patients worldwide. We do not identify a single recurrent mutation in this set convincingly associated with increased viral transmission. Instead, recurrent mutations currently in circulation appear to be evolutionary neutral and primarily induced by the human immune system via RNA editing, rather than being signatures of adaptation. At this stage we find no evidence for significantly more transmissible lineages of SARS-CoV-2 due to recurrent mutations.

Article activity feed

  1. SciScore for 10.1101/2020.05.21.108506: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    , GISAID EPI_ISL_402125) using MAFFT [51] implemented in the rapid phylodynamic alignment pipeline provided by Augur (github.com/nextstrain/augur).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Subsequently, for both alignments, a maximum likelihood phylogenetic tree was built using IQ-TREE 2.1.0 Covid release (https://github.com/iqtree/iqtree2/releases/tag/v2.1.0) as the tree-building method [52].
    IQ-TREE
    suggested: (IQ-TREE, RRID:SCR_017254)
    For the less stringent masking of the alignment, HomoplasyFinder identified a total of 5,793 homoplasies (Figure S5).
    HomoplasyFinder
    suggested: (HomoplasyFinder, RRID:SCR_017300)
    This was done by retrieving the amino acid changes corresponding to all SNPs at these positions using a custom Biopython (v.1.76) script (https://github.com/cednotsed/nucleotide_to_AA_parser.git).
    Biopython
    suggested: (Biopython, RRID:SCR_007173)
    We simulated a 10,000 nucleotide alignment comprising 1,000 accessions using the rtree() simulator available in Ape v5.3 [59] and genSeq from the R package PhyTools v0.7-2.0 [60] using a single rate transition matrix multiplied by a rate of 6×10−4 to approximately match that estimated in [1].
    PhyTools
    suggested: (phytools, RRID:SCR_015502)

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    However, we acknowledge this approach has some limitations. We have, for example, relied on admittedly arbitrary choices concerning the number of minimal observations and nodes required to conduct statistical testing. While it seems unlikely this would change our overall conclusions, which are highly consistent for two tested alignments, results for particular mutations should be considered in light of this caveat and may change as more genomes become available. Further, our approach necessarily entails some loss of information and therefore statistical power. This is because our motivation to test independent occurrences means that we do not handle “embedded homoplasies” explicitly: we simply discard them (Figure 2), although inclusion of embedded homoplasies does not change the overall conclusions (Figure S11b). Finally, while our approach is undoubtedly more robust to demographic confounding (such as founder bias), it is impossible to completely remove all the sources of bias that come with the use of available public genomes. In addition, it is of note that the SARS-CoV-2 population has only acquired moderate genetic diversity since its jump into the human population and, consequently, most branches in the phylogenetic tree are only supported by very few mutations. As a result of the low genetic diversity, most nodes in the tree have only low statistical support [41]. This prompted us to apply a series of stringent filters and masking strategies to the alignment (see Meth...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

  2. SciScore for 10.1101/2020.05.21.108506: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    GISAID EPI_ISL_402125 ) using MAFFT [ 41 ] implemented in the rapid phylodynamic alignment pipeline provided by Augur ( github . com/nextstrain/augur) .
    MAFFT
    suggested: (MAFFT, SCR_011811)
    Subsequently , for both alignments , a maximum likelihood phylogenetic tree was built using the Augur tree implementation selecting IQ-TREE as the tree-building method [ 42] .
    IQ-TREE
    suggested: (IQ-TREE, SCR_017254)
    To further validate our 273 detected homoplasies , we obtained the 5,411 short-read datasets available on the NCBI Sequence Read Archive ( SRA ) as of 11th May 2020 .
    NCBI Sequence Read Archive
    suggested: (NCBI Sequence Read Archive (SRA), SCR_004891)
    Mapping to WuhanHu-1 was performed using a BWA-MEM [ 50] .
    BWA-MEM
    suggested: (Sniffles, SCR_017619)
    After PCR-duplicates removal using PicardTools MarkDuplicates v.2.7.0 , SNPs were called using Freebayes v.
    PicardTools
    suggested: None
          <div style="margin-bottom:8px">
            <div><b>Freebayes</b></div>
            <div>suggested: (FreeBayes, <a href="https://scicrunch.org/resources/Any/search?q=SCR_010761">SCR_010761</a>)</div>
          </div>
        </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">This was done by retrieving the amino acid changes corresponding to all SNPs at these positions using a custom Biopython (v.1.76) script (https://github.com/cednotsed/nucleotide_to_AA_parser.git).</td><td style="min-width:100px;border-bottom:1px solid lightgray">
          <div style="margin-bottom:8px">
            <div><b>Biopython</b></div>
            <div>suggested: (Biopython, <a href="https://scicrunch.org/resources/Any/search?q=SCR_007173">SCR_007173</a>)</div>
          </div>
        </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">HomoplasyFinder [46] flags all nodes of a phylogeny corresponding to an ancestor that acquired an homoplasy.</td><td style="min-width:100px;border-bottom:1px solid lightgray">
          <div style="margin-bottom:8px">
            <div><b>HomoplasyFinder</b></div>
            <div>suggested: (HomoplasyFinder, <a href="https://scicrunch.org/resources/Any/search?q=SCR_017300">SCR_017300</a>)</div>
          </div>
        </td></tr><tr><td style="min-width:100px;vertical-align:top;border-bottom:1px solid lightgray">Computational analyses were performed on UCL Computer Science cluster and the South Green bioinformatics platform hosted on the CIRAD HPC cluster.</td><td style="min-width:100px;border-bottom:1px solid lightgray">
          <div style="margin-bottom:8px">
            <div><b>CIRAD</b></div>
            <div>suggested: (CIRAD, <a href="https://scicrunch.org/resources/Any/search?q=SCR_011153">SCR_011153</a>)</div>
          </div>
        </td></tr></table>
    

    Results from OddPub: Thank you for sharing your code.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore is not a substitute for expert review. SciScore checks for the presence and correctness of RRIDs (research resource identifiers) in the manuscript, and detects sentences that appear to be missing RRIDs. SciScore also checks to make sure that rigor criteria are addressed by authors. It does this by detecting sentences that discuss criteria such as blinding or power analysis. SciScore does not guarantee that the rigor criteria that it detects are appropriate for the particular study. Instead it assists authors, editors, and reviewers by drawing attention to sections of the manuscript that contain or should contain various rigor criteria and key resources. For details on the results shown here, including references cited, please follow this link.