SARS-CoV-2 convergent evolution cannot be reliably inferred from phylogenetic analyses

This article has been Reviewed by the following groups

Read the full article

Abstract

A homoplasy is a trait shared between individuals that did not arise in a common ancestor, but rather is the result of convergent evolution. SARS-CoV-2 homoplasic mutations are important to characterise, because the evidence for a mutation conferring a fitness advantage is strengthened if this mutation has evolved independently and repeatedly in separate viral lineages. Yet detecting homoplasy is difficult due to insufficient variation between sequences to construct reliable phylogenetic trees. Here, we develop a method to more robustly identify confident homoplasies. We derive a maximum likelihood (ML) tree, with taxa bearing seemingly recurrent mutations dispersed across the tree, and then, for each potentially homoplasic mutation, we derive an alternative tree where the same taxa are constrained to one clade such that the mutation is no longer homoplasic. We then compare how well the two trees fit the sequence data. Applying this method to SARS-CoV-2 yields only a few instances where the constrained trees have significantly less statistical support than unconstrained tree, suggesting phylogenetics can provide limited support for homoplasy in SARS-CoV-2 and that caution is needed when inferring evidence of convergent evolution from phylogenetic methods in the absence of evidence from other sources.

Article activity feed

  1. SciScore for 10.1101/2021.05.15.444301: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Experimental Models: Organisms/Strains
    SentencesResources
    For each identified homoplasy, we constructed a constraint tree in the form ((a1,a2,a3…),(b1,b1,b3…)), where (a1,a2,a3…) included the taxa downstream of any of the homoplasic changes that had the resulting amino acid; and (b1,b1,b3…) comprised taxa that were not downstream of any of these changes and had a different amino acid.
    b1
    suggested: None
    Software and Algorithms
    SentencesResources
    Corresponding full-length human SARS-CoV-2 sequences were retrieved from the Global Initiative on Sharing All Influenza Database (GISAID) (https://www.gisaid.org/), which they align to the reference Wuhan sequence using MAFFT.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    The alignment was further adjusted by hand using AliView v1.26 (Larsson, 2014).
    AliView
    suggested: (AliView, RRID:SCR_002780)
    Statistical analysis: The AU test (Shimodaira, 2002) as implemented in IQ-TREE was used to determine whether the constrained, alternative tree topology was a significantly worse fit to the data than the unconstrained ML tree topology, for each homoplasic site.
    IQ-TREE
    suggested: (IQ-TREE, RRID:SCR_017254)
    Trees were viewed on ChromaClade (Monit et al., 2019) and FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).
    FigTree
    suggested: (FigTree, RRID:SCR_008515)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.