Rapid whole genome sequence typing reveals multiple waves of SARS-CoV-2 spread

This article has been Reviewed by the following groups

Read the full article

Abstract

As the pandemic SARS-CoV-2 virus has spread globally its genome has diversified to an extent that distinct clones can now be recognized, tracked, and traced. Identifying clonal groups allows for assessment of geographic spread, transmission events, and identification of new or emerging strains that may be more virulent or more transmissible. Here we present a rapid, whole genome, allele-based method (GNUVID) for assigning sequence types to sequenced isolates of SARS-CoV-2 sequences. This sequence typing scheme can be updated with new genomic information extremely rapidly, making our technique continually adaptable as databases grow. We show that our method is consistent with phylogeny and recovers waves of expansion and replacement of sequence types/clonal complexes in different geographical locations.

GNUVID is available as a command line application ( https://github.com/ahmedmagds/GNUVID ).

Article activity feed

  1. SciScore for 10.1101/2020.06.08.139055: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The 10 ORFs were identified in the remaining 16,866 genomes using blastn [16] and any genome that had any ambiguity or degenerate bases (any base other than A,T,G and C) in the 10 open reading frames (ORF) was excluded.
    blastn
    suggested: (BLASTN, RRID:SCR_001598)
    Temporal plots were extracted using a custom script and plotted in GraphPad Prism v7.0a.
    GraphPad Prism
    suggested: (GraphPad Prism, RRID:SCR_002798)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    One limitation of any ST/CC classification strategy is that paraphyletic groups can occur as a new ST arises from an older ST (e.g. CC301 emerged from CC255 making CC255 paraphyletic). While this means not all ST/CC groups will be monophyletic, this property of the nomenclature may be helpful in gauging emergence and replacement of an ancestral form. To further validate our wgMLST classification system we compared it to the recently proposed “dynamic lineages nomenclature” for SARS-CoV-2 using the pangolin application[1]. A high percentage of viruses (90.5%;40-100%) with the same CC were assigned to the same lineage. When sublineages of the dominant lineage designation were included, this average rose to 99% (89-100%), showing strong agreement between these classification schemes (Supplementary Table 2). Because we included collection dates for each genomic sequence, we can use STs and CCs to better understand the emergence and replacement of certain lineages in certain geographical regions over time. Figure 2A shows temporal plots of the most common 12 CCs around the world. This makes clear the emergence of new CCs over time such as CC255, CC300 and CC258. CC4, the earliest CC, started by representing 60% of sequenced genomes in mid-January, but had dropped to only 5% by mid-March. Of course, relative proportions of STs or CCs isolated and sequenced may be a highly biased statistic that is contingent upon where the isolate comes from, the decision to sequence its genome, and...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.