SARS-CoV-2 convergent evolution cannot be reliably inferred from phylogenetic analyses

Abstract

A homoplasy is a trait shared between individuals that did not arise in a common ancestor, but rather is the result of convergent evolution. SARS-CoV-2 homoplasic mutations are important to characterise, because the evidence for a mutation conferring a fitness advantage is strengthened if this mutation has evolved independently and repeatedly in separate viral lineages. Yet detecting homoplasy is difficult due to insufficient variation between sequences to construct reliable phylogenetic trees. Here, we develop a method to more robustly identify confident homoplasies. We derive a maximum likelihood (ML) tree, with taxa bearing seemingly recurrent mutations dispersed across the tree, and then, for each potentially homoplasic mutation, we derive an alternative tree where the same taxa are constrained to one clade such that the mutation is no longer homoplasic. We then compare how well the two trees fit the sequence data. Applying this method to SARS-CoV-2 yields only a few instances where the constrained trees have significantly less statistical support than unconstrained tree, suggesting phylogenetics can provide limited support for homoplasy in SARS-CoV-2 and that caution is needed when inferring evidence of convergent evolution from phylogenetic methods in the absence of evidence from other sources.

SciScore for 10.1101/2021.05.15.444301: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Experimental Models: Organisms/Strains
Sentences	Resources
For each identified homoplasy, we constructed a constraint tree in the form ((a1,a2,a3…),(b1,b1,b3…)), where (a1,a2,a3…) included the taxa downstream of any of the homoplasic changes that had the resulting amino acid; and (b1,b1,b3…) comprised taxa that were not downstream of any of these changes and had a different amino acid.	b1 suggested: None
Software and Algorithms
Sentences	Resources
Corresponding full-length human SARS-CoV-2 sequences were retrieved from the Global Initiative on Sharing All Influenza Database (GISAID) (https://www.gisaid.org/), which they align to the …

SciScore for 10.1101/2021.05.15.444301: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Experimental Models: Organisms/Strains
Sentences	Resources
For each identified homoplasy, we constructed a constraint tree in the form ((a1,a2,a3…),(b1,b1,b3…)), where (a1,a2,a3…) included the taxa downstream of any of the homoplasic changes that had the resulting amino acid; and (b1,b1,b3…) comprised taxa that were not downstream of any of these changes and had a different amino acid.	b1 suggested: None
Software and Algorithms
Sentences	Resources
Corresponding full-length human SARS-CoV-2 sequences were retrieved from the Global Initiative on Sharing All Influenza Database (GISAID) (https://www.gisaid.org/), which they align to the reference Wuhan sequence using MAFFT.	MAFFT suggested: (MAFFT, RRID:SCR_011811)
The alignment was further adjusted by hand using AliView v1.26 (Larsson, 2014).	AliView suggested: (AliView, RRID:SCR_002780)
Statistical analysis: The AU test (Shimodaira, 2002) as implemented in IQ-TREE was used to determine whether the constrained, alternative tree topology was a significantly worse fit to the data than the unconstrained ML tree topology, for each homoplasic site.	IQ-TREE suggested: (IQ-TREE, RRID:SCR_017254)
Trees were viewed on ChromaClade (Monit et al., 2019) and FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).	FigTree suggested: (FigTree, RRID:SCR_008515)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

SARS-CoV-2 convergent evolution cannot be reliably inferred from phylogenetic analyses

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Parallel adaptation and cryptic global expansion of Mycobacterium tuberculosis Lineage 3

Introgression facilitates rapid evolution of Galápagos tree finches

Phylogenetic Lineages of <a id="article-title"></a>PRRSV-2 from Canada Reveal Patterns of Transboundary Spread and Two Novel Sub-Lineages in North America

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Parallel adaptation and cryptic global expansion of Mycobacterium tuberculosis Lineage 3

Introgression facilitates rapid evolution of Galápagos tree finches

Phylogenetic Lineages of <a id="article-title"></a>PRRSV-2 from Canada Reveal Patterns of Transboundary Spread and Two Novel Sub-Lineages in North America