Ultrafast Sample Placement on Existing Trees (UShER) Empowers Real-Time Phylogenetics for the SARS-CoV-2 Pandemic
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
As the SARS-CoV-2 virus spreads through human populations, the unprecedented accumulation of viral genome sequences is ushering a new era of “genomic contact tracing” – that is, using viral genome sequences to trace local transmission dynamics. However, because the viral phylogeny is already so large – and will undoubtedly grow many fold – placing new sequences onto the tree has emerged as a barrier to real-time genomic contact tracing. Here, we resolve this challenge by building an efficient, tree-based data structure encoding the inferred evolutionary history of the virus. We demonstrate that our approach improves the speed of phylogenetic placement of new samples and data visualization by orders of magnitude, making it possible to complete the placements under real-time constraints. Our method also provides the key ingredient for maintaining a fully-updated reference phylogeny. We make these tools available to the research community through the UCSC SARS-CoV-2 Genome Browser to enable rapid cross-referencing of information in new virus sequences with an ever-expanding array of molecular and structural biology data. The methods described here will empower research and genomic contact tracing for laboratories worldwide.
Software Availability
USHER is available to users through the UCSC Genome Browser at https://genome.ucsc.edu/cgi-bin/hgPhyloPlace . The source code and detailed instructions on how to compile and run UShER are available from https://github.com/yatisht/usher .
Article activity feed
-
SciScore for 10.1101/2020.09.26.314971: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources UShER also maintains the minimum parsimony score of previously traversed nodes in a shared variable and terminates the computation of the set difference in a new node as soon as the parsimony score corresponding to it exceeds the value of this shared variable. UShERsuggested: NoneWe then optimized these two new trees using ten iterations of FastTree, followed by another round of optimization using the -gamma flag as described above. FastTreesuggested: (FastTree, RRID:SCR_015501)This heavily reduced dataset can then be visualized using the existing code-base of the UCSC Genome Browser and we … SciScore for 10.1101/2020.09.26.314971: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources UShER also maintains the minimum parsimony score of previously traversed nodes in a shared variable and terminates the computation of the set difference in a new node as soon as the parsimony score corresponding to it exceeds the value of this shared variable. UShERsuggested: NoneWe then optimized these two new trees using ten iterations of FastTree, followed by another round of optimization using the -gamma flag as described above. FastTreesuggested: (FastTree, RRID:SCR_015501)This heavily reduced dataset can then be visualized using the existing code-base of the UCSC Genome Browser and we output a JSON-formatted file that can be viewed using auspice (https://nextstrain.github.io/auspice/). UCSC Genome Browsersuggested: (UCSC Genome Browser, RRID:SCR_005780)Results from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:We emphasize that sequencing errors are likely to pose similar challenges for other placement tools and our analysis is meant to serve as a guideline to the user rather than highlight the limitations of UShER. Quantifying Uncertainty in Sample Placement: Quantifying uncertainty in phylogenetic placement is critical for accurately interpreting SARS-CoV-2 phylogenies where true phylogenetic signal is limited and sometimes even contradictory36,37. We developed functionality within UShER to report the number of equally parsimonious placements by default. Additionally, UShER can output the minimum number of additional mutations required to accommodate a single sample placed on each branch of the reference tree, a measure which we call the Branch Parsimony Score (BPS). We limit this function to single sample placements because it would be challenging to quantify and to represent the uncertainty imposed by the sequential incorporation of additional samples. As would be expected given the typically unambiguous sample placements for high quality sequences on the global phylogeny, BPS typically increases rapidly with increasing distance along the tree (e.g. Figure 3). UShER is Consistent with Standard Phylogenetics Methods Using Real SARS-CoV-2 Data: To evaluate the performance of our approach under realistic conditions with genuine SARS-CoV-2 data, we used UShER to place real samples onto a global reference phylogeny. Because the phylogeny was necessarily inferred from real data (see ...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
