Phylogenomics reveals viral sources, transmission, and potential superinfection in early-stage COVID-19 patients in Ontario, Canada

This article has been Reviewed by the following groups

Read the full article

Abstract

The emergence and rapid global spread of SARS-CoV-2 demonstrates the importance of infectious disease surveillance, particularly during the early stages. Viral genomes can provide key insights into transmission chains and pathogenicity. Nasopharyngeal swabs were obtained from thirty-two of the first SARS-CoV-2 positive cases (March 18–30) in Kingston Ontario, Canada. Viral genomes were sequenced using Ion Torrent (n = 24) and MinION (n = 27) sequencing platforms. SARS-CoV-2 genomes carried forty-six polymorphic sites including two missense and three synonymous variants in the spike protein gene. The D614G point mutation was the predominate viral strain in our cohort (92.6%). A heterozygous variant (C9994A) was detected by both sequencing platforms but filtered by the ARTIC network bioinformatic pipeline suggesting that heterozygous variants may be underreported in the SARS-CoV-2 literature. Phylogenetic analysis with 87,738 genomes in the GISAID database identified global origins and transmission events including multiple, international introductions as well as community spread. Reported travel history validated viral introduction and transmission inferred by phylogenetic analysis. Molecular epidemiology and evolutionary phylogenetics may complement contact tracing and help reconstruct transmission chains of emerging diseases. Earlier detection and screening in this way could improve the effectiveness of regional public health interventions to limit future pandemics.

Article activity feed

  1. SciScore for 10.1101/2020.06.25.171744: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Filtered variants were used for sample clustering with the Maximum Likelihood Tree19 in Molecular Evolutionary Genetics Analysis (MEGA) software20,21
    MEGA
    suggested: (Mega BLAST, RRID:SCR_011920)
    The ggtree (v2.0.1) and ggplot2 (v3.2.1) packages in R were used to generate tree visualizations.
    ggplot2
    suggested: (ggplot2, RRID:SCR_014601)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    There were several limitations in using genomics to trace viral origin and path of transmission of SARS-CoV-2. First, in the early stages of the pandemic, most countries were only screening and testing international travelers who displayed symptoms. This allowed asymptomatic carriers to go undetected masking potential sources of infection. As a result, gaps exist in the databases reporting SARS-CoV-2 genomes masking the true sources of infection. However, the rapid publication of genome sequences from around the world can help to offset this limitation, by identifying geographical clusters and specific genomic variants that are shared across regions. The relatively high mutation rate helps to distinguish primary and secondary infections in the span of one to two weeks. A second limitation is a lack of detailed travel and interaction histories for patients due to differences in reporting and data collection among collection sites and agencies. Rigorous adherence to standardized data collection protocols, like WHO’s guidance for contact tracing in the context of COVID-19 coupled with genomics data as described here, may facilitate effective contact tracing that is required to break the chains of viral transmission. A final limitation is that sequencing SARS-CoV-2 in COVID-19 cases with low viral load was problematic due to lack of RNA input for library construction as seen by the number of excluded samples and those with low coverage uniformity (Samples 19 and 21). However, thi...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.