CoVizu: Rapid analysis and visualization of the global diversity of SARS-CoV-2 genomes

Roux-Cil Ferreira
Emmanuel Wong
Gopi Gugan
Kaitlyn Wade
Molly Liu
Laura Muñoz Baena
Connor Chato
Bonnie Lu
Abayomi S Olabode
Art F Y Poon

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

Phylogenetics has played a pivotal role in the genomic epidemiology of severe acute respiratory syndrome coronavirus 2, such as tracking the emergence and global spread of variants and scientific communication. However, the rapid accumulation of genomic data from around the world—with over two million genomes currently available in the Global Initiative on Sharing All Influenza Data database—is testing the limits of standard phylogenetic methods. Here, we describe a new approach to rapidly analyze and visualize large numbers of SARS-CoV-2 genomes. Using Python, genomes are filtered for problematic sites, incomplete coverage, and excessive divergence from a strict molecular clock. All differences from the reference genome, including indels, are extracted using minimap2 and compactly stored as a set of features for each genome. For each Pango lineage (https://cov-lineages.org), we collapse genomes with identical features into ‘variants’, generate 100 bootstrap samples of the feature set union to generate weights, and compute the symmetric differences between the weighted feature sets for every pair of variants. The resulting distance matrices are used to generate neighbor-joining trees in RapidNJ that are converted into a majority-rule consensus tree for each lineage. Branches with support values below 50 per cent or mean lengths below 0.5 differences are collapsed, and tip labels on affected branches are mapped to internal nodes as directly sampled ancestral variants. Currently, we process about 2 million genomes in approximately 9 h on 52 cores. The resulting trees are visualized using the JavaScript framework D3.js as ‘beadplots’, in which variants are represented by horizontal line segments, annotated with beads representing samples by collection date. Variants are linked by vertical edges to represent branches in the consensus tree. These visualizations are published at https://filogeneti.ca/CoVizu. All source code was released under an MIT license at https://github.com/PoonLab/covizu.

Version published to 10.1093/ve/veab092
Nov 8, 2021
ScreenIT
Jul 26, 2021
SciScore for 10.1101/2021.07.20.453079: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics not detected.
Sex as a biological variable not detected.
Randomization not detected.
Blinding not detected.
Power Analysis not detected.
Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:
- Thank you for including a …
SciScore for 10.1101/2021.07.20.453079: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics not detected.
Sex as a biological variable not detected.
Randomization not detected.
Blinding not detected.
Power Analysis not detected.
Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:
Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
No funding statement was detected.
No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.
Read the original source
Version published to 10.1101/2021.07.20.453079 on bioRxiv
Jul 21, 2021

Evaluating Reference-Independent Pipelines for the Detection of Spreading Organisms in Metagenomic Datasets

This article has 7 authors:
1. N.S. Popov
2. V.V. Panova
3. M. Molchanova
4. S.A. Gurov
5. A.N. Lukashev
6. E.N. Ilina
7. A.I. Manolov
This article has no evaluationsLatest version May 6, 2026
Rapid phylogenomic analysis for viral surveillance and metagenomic profiling with Omni2Tree

This article has 9 authors:
1. Sina Majidian
2. Adrian Chalco
3. Xinchang Zheng
4. Richard J Webby
5. Andrew S Bowman
6. Rebecca L Poulson
7. Nicole M Nemeth
8. Fritz J Sedlazeck
9. Daniel P Agustinho
Reviewed by Rapid Reviews Infectious Diseases

This article has 3 evaluationsAppears in 1 listLatest version May 1, 2026Latest activity Jun 11, 2026
Verticall: A fast and robust tool for recombination detection in large-scale bacterial genomic datasets

This article has 3 authors:
1. Erkison Ewomazino Odih
2. Ryan R. Wick
3. Kathryn E. Holt
This article has no evaluationsLatest version Apr 24, 2026

Ethics	not detected.
Sex as a biological variable	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Evaluating Reference-Independent Pipelines for the Detection of Spreading Organisms in Metagenomic Datasets

Rapid phylogenomic analysis for viral surveillance and metagenomic profiling with Omni2Tree

Verticall: A fast and robust tool for recombination detection in large-scale bacterial genomic datasets