Genomic epidemiology of the first two waves of SARS-CoV-2 in Canada

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This study measures the transmission of SARS-CoV-2 lineages in Canada and the rate at which lineages were imported into Canada from other countries during the first year of the pandemic. This information is critical for understanding basic SARS-CoV-2 evolution and epidemiology, but the impacts of sampling biases in space and time might weaken the conclusions.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #3 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article

Abstract

Tracking the emergence and spread of SARS-CoV-2 lineages using phylogenetics has proven critical to inform the timing and stringency of COVID-19 public health interventions. We investigated the effectiveness of international travel restrictions at reducing SARS-CoV-2 importations and transmission in Canada in the first two waves of 2020 and early 2021. Maximum likelihood phylogenetic trees were used to infer viruses’ geographic origins, enabling identification of 2263 (95% confidence interval: 2159–2366) introductions, including 680 (658–703) Canadian sublineages, which are international introductions resulting in sampled Canadian descendants, and 1582 (1501–1663) singletons, introductions with no sampled descendants. Of the sublineages seeded during the first wave, 49% (46–52%) originated from the USA and were primarily introduced into Quebec (39%) and Ontario (36%), while in the second wave, the USA was still the predominant source (43%), alongside a larger contribution from India (16%) and the UK (7%). Following implementation of restrictions on the entry of foreign nationals on 21 March 2020, importations declined from 58.5 (50.4–66.5) sublineages per week to 10.3-fold (8.3–15.0) lower within 4 weeks. Despite the drastic reduction in viral importations following travel restrictions, newly seeded sublineages in summer and fall 2020 contributed to the persistence of COVID-19 cases in the second wave, highlighting the importance of sustained interventions to reduce transmission. Importations rebounded further in November, bringing newly emergent variants of concern (VOCs). By the end of February 2021, there had been an estimated 30 (19–41) B.1.1.7 sublineages imported into Canada, which increasingly displaced previously circulating sublineages by the end of the second wave.Although viral importations are nearly inevitable when global prevalence is high, with fewer importations there are fewer opportunities for novel variants to spark outbreaks or outcompete previously circulating lineages.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    This study produces conservative estimates of the rates of SARS-CoV-2 importation into Canada through February 2021. The study also estimates the relative rates of intra-provincial, inter-provincial, and international transmission by province. Because these rates are investigated over time periods with varying types of non-pharmaceutical interventions, the results provide foundational information on the impact of NPIs and rates of spread to and within Canada. These rates provide useful benchmarks for other regions and deepen our understanding of the natural history of SARS-CoV-2.

    Aside from a few places where speculation is unexpectedly mixed with careful data interpretation, the main limitation of the paper appears to be the unclear impact of sampling biases on the results. These biases occur inside and outside Canada. As the authors note, sequences are missing entirely from many countries and time periods where there was surely transmission. The analysis takes steps to mitigate this problem, but it is not clear how much distortion might remain. It is also unclear whether preferential testing or sequencing of specimens from recent travelers occurred and how strong this preference was (relative to sampling "random" community cases) in different places and times.

    These limitations are shared by many other phylogeographical analyses, but they raise the question of how literally the quantitative estimates and confidence intervals should be interpreted. My intuition is that some are much more robust than others, but this is left as an exercise.

    We have elaborated in the Discussion upon the high level of uncertainty that we have surrounding the exact estimations of importations. Throughout, we have emphasized that the relative dynamics are more important than the absolute estimates. Confidence intervals may underestimate the level of uncertainty.

    "Discussion

    Low sequence representation can lead to underestimates of total introductions if neither index case or descendants were sampled, underestimates of sublineage size if not all descendants were sampled, and similarly, overestimates of the proportion of singletons, which may have been from unsampled transmission chains. Extrapolating an upper estimate of introductions is challenging in the absence of additional data. Clean genomes available in Canada prior to 1 March 2021 represented 4.2% of confirmed diagnoses (and 3.2% when 75% of Canadian sequences retained. Diagnoses were estimated to represent about 9% of total cases in Canada up to September 2020, while other geographies ranged from 5% in Italy to 99% in Qatar (Noh & Danuser, 2021). The probability of a case being detected is affected by geography (sociodemographic structure, testing capacity and recommendations), by individual (age, contact-traced, political beliefs, co-morbidities), and by lineage (symptom severity, infectivity profile). Reason for sequencing is not always random – it could be for an outbreak investigation or to confirm VOC identity - and it varies over time by jurisdiction. As more sequences are generated and made available, we expect more descendants of previously identified sublineages than travellers or their recent contacts harbouring new sublineages or singletons. When sequencing efforts or resources are lower, travellers are a more efficient use of resources if prevalence is higher abroad than domestically, increasing the travel bias. Thus, importations do not scale linearly with sequence representation. In theory, the upper limit of importations by province could be estimated by adjusting for monthly sequence representation, case ascertainment rate, outbreak bias (ratio of probabilities of testing given infected for random versus outbreak-linked), and travel bias (ratio of probabilities of testing given infected for domestic versus travelling populations) over time, stratified by geography. More consistent inclusion of the reason for sequencing and testing in the publicly available metadata could facilitate better estimates of the extent of travel-related and outbreak-related bias. Additionally, prospective cohort studies or seroprevalence studies would ameliorate our estimate of the case ascertainment fraction."

    Reviewer #2 (Public Review):

    In this article entitled "early introductions of SARS-CoV-2 sublineages into Canada drove the 2020 epidemic", McLaughlin et al analyze genetic patterns in a large set of publicly-available SARS-CoV-2 sequences to characterize COVID-19 introductions and spread throughout Canada early in the pandemic. The authors conclude a majority of viral introductions into Canada can be traced to the United States via Quebec and Ontario. In addition, they report a reduction in viral importation into Canada following implementation of travel restrictions and other public health measures to reduce spread. The authors speculate that more rapid implementation of border controls and quarantine might have significantly reduced COVID-19 disease burden in Canada, at least early in the pandemic.

    Although many similar genomic epidemiology studies using SARS-CoV-2 data have been published, this is the first major study focused on Canada at a national scale. The authors download a large dataset from GISAID and use appropriate tools and methods to clean and subsample this dataset. They appropriately acknowledge the limitations of their dataset as a small subset of the total Canadian case counts. Although the work is largely retrospective, the authors argue and I agree that this work can be valuable in evaluating the effectiveness of public health interventions to reduce viral importation and spread and therefore can be informative of ongoing public health measures and useful in comparing viral dynamics to the present time (future work).

    While I believe it is ultimately worthy of publication, this article can be strengthened in a few key areas. Primarily, the authors do not assess the robustness of their results against alternative subsampling schemes. They subsample their global sequences proportionally to case counts, but retain all Canadian sequences. As a result, their dataset is skewed heavily to sequences collected during the winter and spring of 2020, which is not representative of case counts or of case distribution. Additionally, the study focuses primarily on international importations with very limited analyses and perspective on the role of person-to-person spread within Canada.

    Overall, this study deploys a set of tools used by many others in a new and important geographic region of Northern America. They make important, although, retrospective conclusions about the drivers of the COVID-19 pandemic in 2020 in Canada and conclude a reduction of international travel and quarantine requirements were important measures to reduce spread.

    We thank the reviewer for their perspective on how we could better consider the sampling bias attributable to focusing on Canadian sequences and the extent of domestic transmission by province and sublineage. For the former, we have undertaken a sensitivity analysis to subsample the Canadian sequences at 25%, 50%, and 75%, which we believe addresses the bias in regards to contributions of domestic versus international importations. We have also added figures and description of the domestic circulation of dominant sublineages during the first two waves.

    Reviewer #3 (Public Review):

    The authors present a comprehensive description of the early importation and transmission dynamics of SARS-CoV-2 during the early stages of the COVID-19 epidemic in Canada. They implement phylodynamic analyses on a rich genomic data set generated within the country, contrasted to a vast collection of publicly available SARS-CoV-2 sequences from across the globe. Due to the vast quantities of genomic data available for this virus, they apply a downsampling scheme to generate a computationally manageable set of sequences on which analyses are run: this set includes all of the (high-quality) available sequences generated within Canada and a selection of sequences from other countries, which is proportional to the monthly reported COVID-19 cases in each of those countries. Following this step, the authors use a series of phylogenetic and phylogeographic methods to explore the number of importations of the virus to the country, the sources of these importations and the recipient provinces in Canada. They also characterise the sublineages that result from these importations (i.e., importations that result in onwards transmission), particularly regarding their size, duration and circulation between provinces.

    The authors make good use of an abundant collection of SARS-CoV-2 genome sequences collected across all of Canada, providing one of the most in-depth panoramas of the spatiotemporal spread of the virus in the country during 2020. While not all Canadian provinces are represented within the data set, it is evident that the ones that contain the largest urban areas and represent the main international travel hubs within the country are included. The characterisation of the sublineages that emerge from the inferred importation events are very comprehensive and highlight how the largest importation peak of 2020 was preceded by the implementation of non-pharmaceutical interventions, while also showing that overall introductions continued at considerably lower levels during the months where these interventions remained stringent. They also show how most of these earlier sublineages became 'inactive' (i.e., extinct or no longer represented in the country's genomic surveillance) while a small proportion of the earlier introductions did remain active for longer timespans. The exploration of the main hubs where importations were detected (Quebec and Ontario) and the role that these provinces had in seeding transmission lineages across other Canadian provinces provides an interesting picture of the domestic transmission dynamics for SARS-CoV-2.

    The attempt by the authors to identify the international sources of importation faces some challenges which arise from the vastly heterogeneous sequencing efforts by different countries across time. Phylogeographic methods have been long known to be sensitive to sampling bias; this is particularly the case for the COVID-19 pandemic where key territories presented well-documented underreporting of both new cases and viral genome sequences, likely introducing gaps in the available genomic data. The authors choose an interesting approach to address this bias, informing their downsampling by the monthly COVID-19 cases reported by the Johns Hopkins University Center for Systems Science and Engineering (through the 'coronavirus' R package). It is likely that this approach manages to account for some of the sampling bias between countries, but the lack of validation tests for the method and the lack of external confirmation of these results through complementary data sources warrants some careful interpretation of these findings and the uncertainty associated to them. Beyond the available sequence data, case reporting (e.g., data collected by the JHU-CSSE) has also been found to be heterogeneous across countries, particularly where diagnostic scale-up did not keep up with the local epidemic trends. These biases are less likely to affect some of the main identified sources of importation like the USA, but the possible effects for other locations will probably vary.

    In regards to the effect of the potential bias imposed towards identifying USA importation sources:

    Although the USA was highly represented in all of our subsamples as a result of its large contribution to COVID-19 cases in 2020 and high sequence availability during early months, our results suggest a greater effect than due to sampling alone. On average, the USA sequences represented 28.9% (28.7 - 29.2%) of total international sequences, yet accounted for 46.3% (44.0 - 48.7%) of all sublineages and 57.7% (55.6 - 59.8%) of singletons. Upon maximizing the number of Canadian sequences in the analysis, where global sequence representation was more normalized but less comprehensive, the USA sequences represented fewer of the international sequences (25.8%, 25.6 - 26.1%) and still accounted for 38.4% (37.0 - 39.8%) of sublineages and 46.4% (44.6 - 48.3%) of singletons.

    While individual reports of the early epidemics in specific provinces have been published, this is the first nation-wide analysis of the early COVID-19 epidemic in Canada. Given the geographical location and size of the country, these findings are key in understanding the early phases of the COVID-19 pandemic; they also add to the growing body of evidence describing the effects of multiple seeding events on the persistence of an epidemic caused by a respiratory pathogen, the speed at which such a pathogen can spread across large distances and the changes in transmission dynamics that accompany behavioural changes in human populations (in this case, derived from public health interventions). It is also important to highlight that the downsampling approach used by the authors to generate a computationally manageable data set could potentially be useful and applied to other contexts, following deeper exploration and validation.

  2. Evaluation Summary:

    This study measures the transmission of SARS-CoV-2 lineages in Canada and the rate at which lineages were imported into Canada from other countries during the first year of the pandemic. This information is critical for understanding basic SARS-CoV-2 evolution and epidemiology, but the impacts of sampling biases in space and time might weaken the conclusions.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #3 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    This study produces conservative estimates of the rates of SARS-CoV-2 importation into Canada through February 2021. The study also estimates the relative rates of intra-provincial, inter-provincial, and international transmission by province. Because these rates are investigated over time periods with varying types of non-pharmaceutical interventions, the results provide foundational information on the impact of NPIs and rates of spread to and within Canada. These rates provide useful benchmarks for other regions and deepen our understanding of the natural history of SARS-CoV-2.

    Aside from a few places where speculation is unexpectedly mixed with careful data interpretation, the main limitation of the paper appears to be the unclear impact of sampling biases on the results. These biases occur inside and outside Canada. As the authors note, sequences are missing entirely from many countries and time periods where there was surely transmission. The analysis takes steps to mitigate this problem, but it is not clear how much distortion might remain. It is also unclear whether preferential testing or sequencing of specimens from recent travelers occurred and how strong this preference was (relative to sampling "random" community cases) in different places and times. These limitations are shared by many other phylogeographical analyses, but they raise the question of how literally the quantitative estimates and confidence intervals should be interpreted. My intuition is that some are much more robust than others, but this is left as an exercise.

  4. Reviewer #2 (Public Review):

    In this article entitled "early introductions of SARS-CoV-2 sublineages into Canada drove the 2020 epidemic", McLaughlin et al analyze genetic patterns in a large set of publicly-available SARS-CoV-2 sequences to characterize COVID-19 introductions and spread throughout Canada early in the pandemic. The authors conclude a majority of viral introductions into Canada can be traced to the United States via Quebec and Ontario. In addition, they report a reduction in viral importation into Canada following implementation of travel restrictions and other public health measures to reduce spread. The authors speculate that more rapid implementation of border controls and quarantine might have significantly reduced COVID-19 disease burden in Canada, at least early in the pandemic.

    Although many similar genomic epidemiology studies using SARS-CoV-2 data have been published, this is the first major study focused on Canada at a national scale. The authors download a large dataset from GISAID and use appropriate tools and methods to clean and subsample this dataset. They appropriately acknowledge the limitations of their dataset as a small subset of the total Canadian case counts. Although the work is largely retrospective, the authors argue and I agree that this work can be valuable in evaluating the effectiveness of public health interventions to reduce viral importation and spread and therefore can be informative of ongoing public health measures and useful in comparing viral dynamics to the present time (future work).

    This article can be strengthened in a few key areas. Primarily, the authors do not assess the robustness of their results against alternative subsampling schemes. They subsample their global sequences proportionally to case counts, but retain all Canadian sequences. As a result, their dataset is skewed heavily to sequences collected during the winter and spring of 2020, which is not representative of case counts or of case distribution. Additionally, the study focuses primarily on international importations with very limited analyses and perspective on the role of person-to-person spread within Canada.

    Overall, this study deploys a set of tools used by many others in a new and important geographic region of Northern America. They make important, although, retrospective conclusions about the drivers of the COVID-19 pandemic in 2020 in Canada and conclude a reduction of international travel and quarantine requirements were important measures to reduce spread.

  5. Reviewer #3 (Public Review):

    The authors present a comprehensive description of the early importation and transmission dynamics of SARS-CoV-2 during the early stages of the COVID-19 epidemic in Canada. They implement phylodynamic analyses on a rich genomic data set generated within the country, contrasted to a vast collection of publicly available SARS-CoV-2 sequences from across the globe. Due to the vast quantities of genomic data available for this virus, they apply a downsampling scheme to generate a computationally manageable set of sequences on which analyses are run: this set includes all of the (high-quality) available sequences generated within Canada and a selection of sequences from other countries, which is proportional to the monthly reported COVID-19 cases in each of those countries. Following this step, the authors use a series of phylogenetic and phylogeographic methods to explore the number of importations of the virus to the country, the sources of these importations and the recipient provinces in Canada. They also characterise the sublineages that result from these importations (i.e., importations that result in onwards transmission), particularly regarding their size, duration and circulation between provinces.

    The authors make good use of an abundant collection of SARS-CoV-2 genome sequences collected across all of Canada, providing one of the most in-depth panoramas of the spatiotemporal spread of the virus in the country during 2020. While not all Canadian provinces are represented within the data set, it is evident that the ones that contain the largest urban areas and represent the main international travel hubs within the country are included. The characterisation of the sublineages that emerge from the inferred importation events are very comprehensive and highlight how the largest importation peak of 2020 was preceded by the implementation of non-pharmaceutical interventions, while also showing that overall introductions continued at considerably lower levels during the months where these interventions remained stringent. They also show how most of these earlier sublineages became 'inactive' (i.e., extinct or no longer represented in the country's genomic surveillance) while a small proportion of the earlier introductions did remain active for longer timespans. The exploration of the main hubs where importations were detected (Quebec and Ontario) and the role that these provinces had in seeding transmission lineages across other Canadian provinces provides an interesting picture of the domestic transmission dynamics for SARS-CoV-2.

    The attempt by the authors to identify the international sources of importation faces some challenges which arise from the vastly heterogeneous sequencing efforts by different countries across time. Phylogeographic methods have been long known to be sensitive to sampling bias; this is particularly the case for the COVID-19 pandemic where key territories presented well-documented underreporting of both new cases and viral genome sequences, likely introducing gaps in the available genomic data. The authors choose an interesting approach to address this bias, informing their downsampling by the monthly COVID-19 cases reported by the Johns Hopkins University Center for Systems Science and Engineering (through the 'coronavirus' R package). It is likely that this approach manages to account for some of the sampling bias between countries, but the lack of validation tests for the method and the lack of external confirmation of these results through complementary data sources warrants some careful interpretation of these findings and the uncertainty associated to them. Beyond the available sequence data, case reporting (e.g., data collected by the JHU-CSSE) has also been found to be heterogeneous across countries, particularly where diagnostic scale-up did not keep up with the local epidemic trends. These biases are less likely to affect some of the main identified sources of importation like the USA, but the possible effects for other locations will probably vary.

    While individual reports of the early epidemics in specific provinces have been published, this is the first nation-wide analysis of the early COVID-19 epidemic in Canada. Given the geographical location and size of the country, these findings are key in understanding the early phases of the COVID-19 pandemic; they also add to the growing body of evidence describing the effects of multiple seeding events on the persistence of an epidemic caused by a respiratory pathogen, the speed at which such a pathogen can spread across large distances and the changes in transmission dynamics that accompany behavioural changes in human populations (in this case, derived from public health interventions). It is also important to highlight that the downsampling approach used by the authors to generate a computationally manageable data set could potentially be useful and applied to other contexts, following deeper exploration and validation.

  6. SciScore for 10.1101/2021.04.09.21255131: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Maximum likelihood phylogenetic inference: We used the subsampled alignment to infer an approximate maximum likelihood (ML) phylogeny using FastTree version 2.2.141 with a generalized time-reversible substitution model with random starting tree seeds.
    FastTree
    suggested: (FastTree, RRID:SCR_015501)
    We estimated the time of most recent common ancestor (tMRCA) for introduction nodes using the bootstrap time-scaled trees inferred in LSD2 via IQ-TREE 2.1.2 with a relaxed molecular clock and 100 bootstrap trees to estimate 95% confidence intervals.
    IQ-TREE
    suggested: (IQ-TREE, RRID:SCR_017254)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.