Phylogenetic analysis of SARS-CoV-2 data is difficult
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org . Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising all virus sequences available on May 5, 2020 from gisaid.org . We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be possible. Finally, an automatic classification of the current sequences into sub-classes based on statistical criteria is also not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution.
Article activity feed
-
SciScore for 10.1101/2020.08.05.239046: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Finally, we find that distinct viral sub-classes can not be identified by executing our mPTP tool for molecular species delimitation on all trees in the respective plausible tree sets of all four alignment versions.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any …
SciScore for 10.1101/2020.08.05.239046: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Finally, we find that distinct viral sub-classes can not be identified by executing our mPTP tool for molecular species delimitation on all trees in the respective plausible tree sets of all four alignment versions.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-