Phylodynamics of SARS-CoV-2 in France, Europe, and the world in 2020

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This study represents an important contribution to our understanding of SARS-CoV-2 transmission dynamics in France, Europe and globally during the early pandemic in 2020. Through evaluation of the contributions of intra- and inter-regional transmission at global, continental, and domestic levels, the authors explore how international travel restrictions reduced inter-regional transmission while permitting increased transmission intra-regionally. Unfortunately, at this time this work suffers from a number of serious analytical shortcomings, all of which can be overcome with major revisions and re-analysis.

This article has been Reviewed by the following groups

Read the full article

Abstract

Although France was one of the most affected European countries by the COVID-19 pandemic in 2020, the dynamics of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) movement within France, but also involving France in Europe and in the world, remain only partially characterized in this timeframe. Here, we analyzed GISAID deposited sequences from January 1 to December 31, 2020 ( n = 638,706 sequences at the time of writing). To tackle the challenging number of sequences without the bias of analyzing a single subsample of sequences, we produced 100 subsamples of sequences and related phylogenetic trees from the whole dataset for different geographic scales (worldwide, European countries, and French administrative regions) and time periods (from January 1 to July 25, 2020, and from July 26 to December 31, 2020). We applied a maximum likelihood discrete trait phylogeographic method to date exchange events (i.e., a transition from one location to another one), to estimate the geographic spread of SARS-CoV-2 transmissions and lineages into, from and within France, Europe, and the world. The results unraveled two different patterns of exchange events between the first and second half of 2020. Throughout the year, Europe was systematically associated with most of the intercontinental exchanges. SARS-CoV-2 was mainly introduced into France from North America and Europe (mostly by Italy, Spain, the United Kingdom, Belgium, and Germany) during the first European epidemic wave. During the second wave, exchange events were limited to neighboring countries without strong intercontinental movement, but Russia widely exported the virus into Europe during the summer of 2020. France mostly exported B.1 and B.1.160 lineages, respectively, during the first and second European epidemic waves. At the level of French administrative regions, the Paris area was the main exporter during the first wave. But, for the second epidemic wave, it equally contributed to virus spread with Lyon area, the second most populated urban area after Paris in France. The main circulating lineages were similarly distributed among the French regions. To conclude, by enabling the inclusion of tens of thousands of viral sequences, this original phylodynamic method enabled us to robustly describe SARS-CoV-2 geographic spread through France, Europe, and worldwide in 2020.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    The authors use what is potentially a novel method for bootstrapping sequence data to evaluate the extent to which SARS-CoV-2 transmissions occurred between regions of the world, between France and other European countries, and between some distinct regions within France. Data from the first two waves of SARS-CoV-2 in Europe were considered, from 2020 into January 2021. The paper provides more detail about the specific spread of the virus around Europe, specifically within France, than other work in this area of which I am aware.

    First of all, we would like to thank reviewer #1 for their evaluation and their various comments which, in our opinion, have allowed us to considerably improve the manuscript.

    An interesting facet of the methodology used is the downsampling of sequence data, generating multiple bootstraps each of around 500-1000 sequences and conducting analysis on each one. This has the strength of sampling, in total, a large number of sequences, while reducing the overall computational cost of analysis on a database that contains in total several hundred thousand sequences. A question I had about the results concerns the extent of downsampling versus the rate of viral migration: If between-country movements are rapid, a reduced sample could be misleading, for example characterising a transmission path from A to B to C as being from A to C by virtue of missing data. I acknowledge that this would be a problem with any phylogeographic analysis relying on limited data. However, in this case, how does the rate of migration between locations compare to the length of time between samples in the reduced trees? Along these lines, I was unclear to what extent the reported proportions of intra- versus inter-regional transmissions (e.g. line 223) would be vulnerable to sampling effects.

    This question is indeed a very important one. Between-country movement rate can be high but the contagious period for a SARS-CoV-2-infected individual is short (a bit less than two weeks in average). In our subsamples, the dated trees have a median branch length around 20 days. To ensure that our subsamples did not introduce errors in estimating the exchange events between locations, we conducted a simulation. Briefly, we generated a tree of 1,000,000 tips with a five-states discrete trait. We then took 100 subsampled 1000-leaves trees, reconstructed the ancestry for the discrete trait and assess transitions between states. The error rate is less than 3% on average: it comprises the missing data, as you pointed out, and the errors in reconstructing the ancestry for the trait deeper in the tree.

    We think that overall, less than 3% is a satisfying error rate.

    The results of this specific simulation were added to the paper (lines 150-157) and as Figure 2—figure supplement 1.

    A further question around the methodology was the use of an artificially high fixed clock rate in the phylogenetic analysis so as to date the tree in an unbiased way. Although I understood that the stated action led to the required results, given the time available for review I was unable to figure out why this should be so. Is this an artefact of under-sampling, or of approximations made in the phylogenetic inference? Is this a well-known phenomenon in phylogenetic inference?

    We thank reviewer #1, who was, as reviewer #2 and the editor, disturbed by the use of an artificially fast and fixed molecular clock. It was an artifact to correct a mistake in our code that has been fixed. See the answer to point (3) of the editor.

    The value of this kind of research is highlighted in the paper, in that genomic data can be used to assess and guide public health measures (line 64). This work elucidates several facts about the geographical spread of SARS-CoV-2 within France and between European countries. The more clearly these facts can be translated into improved or more considered public health action, through the evaluation of previous policy actions, or through the explication of how future actions could lead to improved outcomes, the more this work will have a profound and ongoing impact.

    This is a very interesting point to emphasize indeed. We are currently discussing with public health specialists in our institution on how to assess past public health actions using phylodynamics data in a statistically valid manner.

    Reviewer #2 (Public Review):

    This study represents an important contribution to our understanding of SARS-CoV-2 transmission dynamics in France, Europe and globally during the early pandemic in 2020 and the authors should be congratulated for tackling this important question. Through evaluation of the contributions of intra- and inter-regional transmission at global, continental, and domestic levels, the authors provided compelling, although as of yet correlative and incomplete, evidence towards how international travel restrictions reduced inter-regional transmission while permitting increased transmission intra-regionally. Unfortunately, however this work suffers from a number of serious analytical shortcomings, all of which can be overcome in a major revision and re-analysis.

    We would like to thank the reviewer #2 for their evaluation and their various comments. We want to point that reviewer #2 was contacted for advice on strategy for the molecular clock since she performed a study on a similar topic describing SARS-CoV-2 epidemics in Canada during 2020. We strongly believe that all reviewer #2 comments drastically contributed to improve the quality of this work.

    With this genomic epidemiology analysis, the authors disentangled the relative contributions of different geographic levels to transmission events in France and in Europe in the first two COVID-19 waves of 2020. By partitioning the analysis into three complementary, but distinct, geographic levels, the migration flows in and out of continents, countries in Europe, and regions in France were inferred using maximum likelihood ancestral state reconstruction. The major strengths of this paper were the inclusion of multiple geographic levels, the comparison of different rate symmetries in the ancestral character estimation, and the comprehensive qualitative descriptions of comparisons over time and geographies. However, there were also major weaknesses that need to be addressed and are described in more detail below. They include summing across replicates that were drawn with replacement and were not independent; inadequate justification for excluding underrepresented geographies; the assertion that positive correlation between intra-regional transmission and deaths validates the accuracy of the analysis; considering the framework the authors have chosen for this analysis the analysis would accommodate and benefit strongly from increasing the size of the sequence sets selected for analysis in each replicate; and the sparsity of quantitative (over qualitative or exploratory) comparisons and statistics in the reporting of results. In particular, it would greatly strengthen the paper if the authors could better evaluate the effect of travel restrictions on importations and exportations by testing hypotheses, quantifying changes in the presence of restrictions, or estimating inflection points in importation rates.

    We are grateful for this comprehensive listing of the strengths and weaknesses of our study. Regarding the limitations of this study, these will be detailed specifically for each dedicated remark of the reviewer. We would like to emphasize that all the remarks and limitations reported here by reviewer #2 are in our opinion fully justified. We hence have tried to bring additional analyses (study of the Pango lineages, averaging of the subsamples, simulation study to justify the size of the sampling), a modification of the methodology (in particular concerning the molecular clock) and a thorough rewriting of the “Results” section.

    General comments on the Background: Need to elaborate on how this study fits into the big picture in the first paragraph. Should discuss how phylodynamics contributes to understanding of viral outbreaks, SARS-CoV-2 epidemiology and viral evolution.

    We have added in the “Introduction” section some elements to better understand why phylodynamics is an important field in the epidemiology of SARS-CoV-2 and its evolution.

    The authors should consider a hypothesis driven framework for their analyses, for example considering the geographically central position of France what hypotheses stem from this considering sources of viral importations and destinations of exportations from/to Europe vs other international? Or other a priori expectations.

    We agree with reviewer #2 about this remark. Indeed, given the central position of France, we can hypothesize that it has strongly participated in the dissemination of the virus within Europe. This hypothesis has been included in the "Introduction" section of the revised version (lines 102-105).

    To address the computational limits of phylogenetic reconstruction, 100 replicates of fewer than 1000 sequences each were sampled for each epidemic wave at each level. The inter- and intra-regional transmissions were averaged and then summed across replicates in order to compare the relative roles played by each geography towards transmission. While we see the logic in using the sum across replicates, this is highly likely to bias results, especially since in the methods, this is described as sampling with replacement between replicates (LX). The validity of summing replicates needs to be discussed and are likely most appropriately presented as mean or median. Also, these samples are quite small considering the computational capacity of the maximum likelihood tools being used. We recommend repeating the analysis with a substantially larger number of sequences per sample.

    We thank reviewer #2 for this relevant remark. We initially summed the subsamples, a strategy that may possibly bias the results. In the new version of the manuscript, we averaged the subsamples by region and by week as recommended (and stated in the methods, line 536-537).

    About the size of our subsamples, it made no difference to use 1,000, 2,000 or 5,000 genomes in each subsample. To get a more definitive and scientifically sound answer, we performed a simulation assay that has been included in the manuscript and is shown is what is now figure 2 (and figure 2—figure supplement 1). These simulations show that our subsampling strategy allows for an accurate estimate of transition rates for a discrete parameter (lines 107-160).

  2. eLife assessment

    This study represents an important contribution to our understanding of SARS-CoV-2 transmission dynamics in France, Europe and globally during the early pandemic in 2020. Through evaluation of the contributions of intra- and inter-regional transmission at global, continental, and domestic levels, the authors explore how international travel restrictions reduced inter-regional transmission while permitting increased transmission intra-regionally. Unfortunately, at this time this work suffers from a number of serious analytical shortcomings, all of which can be overcome with major revisions and re-analysis.

  3. Reviewer #1 (Public Review):

    The authors use what is potentially a novel method for bootstrapping sequence data to evaluate the extent to which SARS-CoV-2 transmissions occurred between regions of the world, between France and other European countries, and between some distinct regions within France. Data from the first two waves of SARS-CoV-2 in Europe were considered, from 2020 into January 2021. The paper provides more detail about the specific spread of the virus around Europe, specifically within France, than other work in this area of which I am aware.

    An interesting facet of the methodology used is the downsampling of sequence data, generating multiple bootstraps each of around 500-1000 sequences and conducting analysis on each one. This has the strength of sampling, in total, a large number of sequences, while reducing the overall computational cost of analysis on a database that contains in total several hundred thousand sequences. A question I had about the results concerns the extent of downsampling versus the rate of viral migration: If between-country movements are rapid, a reduced sample could be misleading, for example characterising a transmission path from A to B to C as being from A to C by virtue of missing data. I acknowledge that this would be a problem with any phylogeographic analysis relying on limited data. However, in this case, how does the rate of migration between locations compare to the length of time between samples in the reduced trees? Along these lines, I was unclear to what extent the reported proportions of intra- versus inter-regional transmissions (e.g. line 223) would be vulnerable to sampling effects.

    A further question around the methodology was the use of an artificially high fixed clock rate in the phylogenetic analysis so as to date the tree in an unbiased way. Although I understood that the stated action led to the required results, given the time available for review I was unable to figure out why this should be so. Is this an artefact of under-sampling, or of approximations made in the phylogenetic inference? Is this a well-known phenomenon in phylogenetic inference?

    The value of this kind of research is highlighted in the paper, in that genomic data can be used to assess and guide public health measures (line 64). This work elucidates several facts about the geographical spread of SARS-CoV-2 within France and between European countries. The more clearly these facts can be translated into improved or more considered public health action, through the evaluation of previous policy actions, or through the explication of how future actions could lead to improved outcomes, the more this work will have a profound and ongoing impact.

  4. Reviewer #2 (Public Review):

    This study represents an important contribution to our understanding of SARS-CoV-2 transmission dynamics in France, Europe and globally during the early pandemic in 2020 and the authors should be congratulated for tackling this important question. Through evaluation of the contributions of intra- and inter-regional transmission at global, continental, and domestic levels, the authors provided compelling, although as of yet correlative and incomplete, evidence towards how international travel restrictions reduced inter-regional transmission while permitting increased transmission intra-regionally. Unfortunately, however this work suffers from a number of serious analytical shortcomings, all of which can be overcome in a major revision and re-analysis.

    With this genomic epidemiology analysis, the authors disentangled the relative contributions of different geographic levels to transmission events in France and in Europe in the first two COVID-19 waves of 2020. By partitioning the analysis into three complementary, but distinct, geographic levels, the migration flows in and out of continents, countries in Europe, and regions in France were inferred using maximum likelihood ancestral state reconstruction. The major strengths of this paper were the inclusion of multiple geographic levels, the comparison of different rate symmetries in the ancestral character estimation, and the comprehensive qualitative descriptions of comparisons over time and geographies. However, there were also major weaknesses that need to be addressed and are described in more detail below. They include summing across replicates that were drawn with replacement and were not independent; inadequate justification for excluding underrepresented geographies; the assertion that positive correlation between intra-regional transmission and deaths validates the accuracy of the analysis; considering the framework the authors have chosen for this analysis the analysis would accommodate and benefit strongly from increasing the size of the sequence sets selected for analysis in each replicate; and the sparsity of quantitative (over qualitative or exploratory) comparisons and statistics in the reporting of results. In particular, it would greatly strengthen the paper if the authors could better evaluate the effect of travel restrictions on importations and exportations by testing hypotheses, quantifying changes in the presence of restrictions, or estimating inflection points in importation rates.

    General comments on the Background: Need to elaborate on how this study fits into the big picture in the first paragraph. Should discuss how phylodynamics contributes to understanding of viral outbreaks, SARS-CoV-2 epidemiology and viral evolution.

    The authors should consider a hypothesis driven framework for their analyses, for example considering the geographically central position of France what hypotheses stem from this considering sources of viral importations and destinations of exportations from/to Europe vs other international? Or other a priori expectations.

    To address the computational limits of phylogenetic reconstruction, 100 replicates of fewer than 1000 sequences each were sampled for each epidemic wave at each level. The inter- and intra-regional transmissions were averaged and then summed across replicates in order to compare the relative roles played by each geography towards transmission. While we see the logic in using the sum across replicates, this is highly likely to bias results, especially since in the methods, this is described as sampling with replacement between replicates (LX). The validity of summing replicates needs to be discussed and are likely most appropriately presented as mean or median. Also, these samples are quite small considering the computational capacity of the maximum likelihood tools being used. We recommend repeating the analysis with a substantially larger number of sequences per sample.