Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    This paper presents an approach to estimate Rt for 170 countries. While it is an impressive amount of work, I think the pipeline is similar to many currently available frameworks. The paper claims the following novelties over current framework, but more efforts are needed to be done to make it convincing.

    1. Obtain stable estimate from multiple types of data:

    It turns out the stable estimates just repeatedly use the same approaches to different time series (Figure 3A middle). From the wording I think there should be some methods to combine these time series to have a single estimate of Rt. Overall, the Rt and the time series of infection should be unique. It would be suboptimal, for example if there are big differences in the results from death time series and reported cases time series, which one should I trust?

    We think it is a strength to compute different Rt values based on different data, as this allows researchers, policy makers and the public alike to compare the information from different observation types directly. Any discrepancy between two Re trajectories (e.g. between the Re based on cases, Rcc(t), and that based on hospitalisations Rh(t)) is an indication to investigate which external variables (e.g. testing strategy) have changed. We have found it a great advantage when communicating and sharing our results outside of academia that we could point to these separately obtained Re estimates: if the estimates all agreed, more confidence could be given to them.

    If one would want to estimate a single estimate, this would require adopting a fundamentally different framework to estimate Re, which exceeds the scope of this work. One could use heuristics (weights representing the trustworthiness of a given source at a given time) to combine the various Re estimates into a single ensemble estimate. Alternatively, one could model the full underlying population dynamics (e.g. with a compartmental model including hospitalization and death) and adopt a fully Bayesian approach to fitting such a model. However, both options require heuristics or priors that will vary substantially through time and per country (as discussed in the Supplementary Discussion), and thus limit how widely the pipeline can be applied.

    We have revised the manuscript to make it more clear (early on) that we estimate multiple Re values from separate types of data (see also the response to reviewer 3, item #5). In addition, we now discuss more explicitly what the advantages and disadvantages are of showing these estimates separately (lines 281-290).

    1. Adequate representation of uncertainty:

    This is the result in Figure 2, suggesting the CI from EpiEstim is too narrow. This would be expected given that EpiEstim assumed the input infection time series is observed and fixed. It would be expected that the proposed approach would provide wider CI and hence the proportion covered would be more. However, I think to validate the wider CI is the correct one, simulation studies are required. I think the most related one would be Figure 1B. The results suggested that the approach works when the Rt is not rapidly changing. However, I have concern on the methods for simulation (details below).

    Indeed, the difference in coverage between our method and EpiEstim is due to observation noise. We agree the CI from EpiEstim should be correct assuming that the infection incidence time series can be observed perfectly. However, in reality quite a bit of variability is introduced between infection and case observation: not only due to the delay from infection to observation, but also due to e.g. reduced testing capacity on weekends or reporting errors. To accurately assess the coverage of our method (and whether the CIs are too narrow or too wide) we need to include realistic amounts of observation noise in the simulations. This is why we add autocorrelated noise to our simulated observations, where this noise mimics observed residuals in Switzerland and other countries (Figs. S3, S4, S15, S17).

    We have now added explicit comparison to the EpiEstim confidence intervals to supplementary Fig. S4. In addition, we extended the corresponding method section to describe more extensively why and how we added observation noise to our simulations (lines 498-518; see also the detailed response to comment 4 below).

    1. Real-time of the Rt

    There is no simulation about the real-time property of the Rt. The most related one is still Figure 1B. However, looks at the right-tail of the figure (the real-time performance), the proportion covered the true value is decreasing and more efforts are needed to support the framework can be accurate in real-time. For example, how is the real-time performance when Rt is increasing, or Rt decrease sharply due to lock down?

    As suggested, we included an additional simulation study to investigate the accuracy and stability of the last possible Re estimate. We present this analysis in a new results paragraph (subsection "Stability of Re estimates in an outbreak monitoring context"; line 121) and Figure S10. Using this analysis, we highlight the trade-off that exists between the timeliness of the Re estimates and their stability.

    1. simulation methods to estimate Rt

    Both 2) and 3) need simulation to support the results, and hence the simulation approach would be critical. The first part based on Poisson distribution to generate an infection time series, which is OK. However, the issue is the secondary part about how the authors obtained the time series for death/hospitalization/reported cases. To me, after generating the infection time series, based on the delay distribution from infection to death/hospitalization/reported, we could obtain those time series. I am not clear and sure if the authors approach is correct by using smoothing and fitting ARIMA to get those time series.

    We believe there may have been some confusion about how our simulation set-up works, and we provided insufficient detail on the design decisions behind this set-up. We have added more explanation for both points to the paper (lines 503-518; additional supplementary Figs. S15-S17). In brief, our simulation process consists of three parts. We first conduct the two steps the reviewer also mentioned: (i) simulating the infection time series, and (ii) simulating the observed time series by using the delay distribution from infection to death/hospitalisation/case report.

    However, we find that the observations simulated this way are too smooth compared to real data (see Figure S17). Possible reasons for this are that the delay distribution does not account for weekend and holiday effects, the random and occasional delay in recording confirmed cases, nor irregular components such as confirmed cases that are imported from abroad. We therefore added a noise term in our simulations, resulting in a third step: (iii) adding noise generated from an ARIMA model.

    To obtain a realistic ARIMA model for this third step, we fitted a model based on the confirmed case data for SARS-CoV-2 in Switzerland. Specifically, we first obtained the additive residuals based on the log-transformed confirmed cases. We then fitted ARIMA models of various orders and assessed the resulting ACF and PACF plots of their residuals. Based on this, we chose an ARIMA(2,0,1)(0,1,1) model. We refer to Figure S16 to support this: The first row shows the ACF and PACF plots of the original residuals, showing strong autocorrelation. The second row shows the ACF and PACF plots of the residuals after fitting the ARIMA model. We see that there is little autocorrelation left, indicating that this model is reasonable.

    In Figure S17, we present simulated observations based on all three steps, and one can see that they look more realistic than the simulated observations after step (ii).

    We would also like to point out that the ARIMA model is only used to obtain simulated observations. Our main method to estimate Re and obtain the related confidence intervals does not require fitting an ARIMA model.

    Minor comments:

    1. What does near real-time mean? The estimates of Rt are delayed for a few days like other approaches?

    Indeed, the estimates of Rt are delayed by the time it takes from infection to a case to be observed. We have replaced the term “near real-time” by “timely” throughout the manuscript, and added this explanation of the delay more explicitly to the text (line 86).

    1. For the results in Table 1, I think if there are some results suggesting that other approaches (like EpiEstim) perform worse than the proposed approach, it would be better to illustrate the value of the proposed approach.

    We have improved and extended the comparison of our method against others in two ways: (i) we added further comparison of the coverage of our method vs. that of EpiEstim to Fig. S4 (see also the response to major comment 2), and (ii) we added comparison against different commonly used pipelines (see minor comment 3 below). Instead of comparing to other approaches, the analysis in Table 1 was meant to illustrate the use of the Re estimates resulting from our method alone.

    1. I think more discussions are needed for the similarity and differences for current approach. For example, Abbott et al (https://wellcomeopenresearch.org/articles/5-112) used a similar pipeline.

    We added a section to the results (paragraph starting line 182; Fig. 3), dedicated to comparing our approach with relevant alternatives. We compared some of our empirical results with the estimates published on epiforecasts.io (based on EpiNow2 package from Abbott et al.), as well as official COVID-19 Re estimates for Austria (by AGES) and Germany (by RKI). We find that estimates published by the RKI and AGES health authorities are likely to be overconfident and to suffer from previously-identified biases (notably in Gostic et al., 2020, PLOS Computational Biology). We provide a detailed comparison of the features and approaches of these methods (EpiNow2, AGES, RKI), with the addition of the epidemia R-package (Supp File S2). This comparison highlights the unique features of the method developed: its ability to account for time-varying delay distributions and to combine symptom onset data with case data.

    1. Figure S11 is about accounting for known imports. While if the local cases are dominant and hence imported cases would not have a big impact on estimates of Rt. The impact of imported cases on estimates of Rt could be complicated, as suggested in Tsang et al. (https://pubmed.ncbi.nlm.nih.gov/34086944/). In addition to assuming imported cases and 'exported' cases could be canceled, it is also assumed that the imported cases had similar transmissibility to the local cases, which may not be true if there is border control.

    We thank the reviewer for this interesting comment and reference. We added a brief discussion in the result section of the manuscript to address this limitation (lines 174-177).

    Reviewer #2 (Public Review):

    This manuscript describes an algorithm of estimating real time effective reproductive number R_e (t). This algorithm combines several methods in a reasonable way: deconvolution of time series of reported case into time series of infection, a Poisson model for generation of infections, and block-bootstrap of residuals to assess uncertainty. Each component is not necessarily novel, but the performance of this algorithm has been validated using comprehensive simulation studies. The algorithm was applied to COVID-19 surveillance data in selected countries across continents, revealing a great deal of heterogeneity in the association of R_e (t) with nonpharmaceutical interventions. Overall, the conclusions seem reliable.

    I have several moderate critiques and suggestions:

    1. From a statistical point of view, it seems much more natural to integrate the infection generation process and the delay from infection to reporting, possibly with reporting errors, into the same model, with which you will avoid combining the bootstrap and the credible intervals in a somewhat awkward way. I understand you can take advantage of EpiEstim package, but the likelihood is very simple and easy to program up. Nevertheless, I'm not strongly against the current paradigm.

    We agree that such an integrated approach is useful, and makes the uncertainty interval estimation more coherent. However, in such an integrated approach one can not use the analytical solution for the likelihood, and methods that choose this approach (like EpiNow2 and epidemia) tend to pay for it in computational complexity. It also makes it harder to include time-varying delay distributions into the model, one aspect that sets our pipeline apart from existing alternatives.

    An additional advantage of our method is that estimates for the infection incidence are not influenced by priors on Re. In case of a bad model fit this allows us to separate more easily which part of the model may be misbehaving; and as such can help as a sanity check.

    Lastly, our framework has the advantage of modularity: pieces of the pipeline can be (and were) continuously refined or replaced with better pieces. This continuous improvement process allowed a flexible response to the pressing circumstances (the COVID-19 pandemic), and allowed us to extend it to entirely new types of proxy data (e.g., wastewater viral loads - https://ehp.niehs.nih.gov/doi/10.1289/EHP10050 ).

    1. Is there a strong reason to believe the residuals are autocorrelated? The block sampling with block size 10 seems arbitrary. The authors fitted an ARIMA model to the residuals for some countries, how good was the fitting? If the block size doesn't matter, then probably the stronger but simpler assumption of independent residuals may not compromise the estimation of R_e (t) much.

    Yes, there is reason to believe the residuals are autocorrelated. New supplementary Figure S15 shows the ACF and PACF of the residuals based on the confirmed cases of Switzerland, China, New Zealand, France and the US, and one can see that for most countries, the obtained residuals are clearly autocorrelated. We added this point to the simulations method section in the paper (lines 503-518). Please also see our response to Reviewer 2, major point 4 above.

    Choosing an optimal block size for the block bootstrap method is generally difficult. To capture weekly patterns, we need a block size of at least 7. We tried different sizes and found that 10 tended to work well in a variety of simulation settings (an example is given in Fig. S19).

    1. I don't see the necessity of using segmented R_e (t) instead of a smooth curve in the simulation studies. The inferential performance, especially the coverage of the CI's, is much less satisfactory when a segment has a steep slope. The authors may consider constructing splines based on the segments or using basis functions directly.

    We started using a segmented Re(t) trajectory to allow for simple parametric generation of different scenarios (e.g. in new Fig. S10), and to specifically study our ability to estimate sudden transitions in Re (discussed wrt. Table 1, Fig. S2). We agree this approach makes our method look worse than necessary, since it is generally difficult to estimate such abrupt changes in Re. However, we thought this would be the more stringent test of our method, as we will perform better on any more smooth trajectory.

    1. The authors smoothed the log-transformed observed incidences to come up with the residuals. For Poisson data, a variance-stabilized transformation is taking the square root, not the logarithm. In addition, as you already have bootstrap estimates, why not using quantiles directly for CIs but instead using a normal approximation (asymptotic)? When incidence is low, the normal approximation may be much less satisfactory. Also, when using normal approximation for CI, it's much safer to calculate standard deviation and construct CI at the log-scale, i.e., log(θ ̂^*(t)), and then exponentiate back.

    Our goal of transforming the original case observations is to stabilize the variance of the residuals. Indeed, the square root transformation is generally recommended if the data to be transformed is Poisson distributed. In our case, however, the original case observations are not quite Poisson. Specifically, the infection incidence at time t given the past incidence is modelled with a Poisson process (see Section 4.4), but the case observations are modelled with an additional convolution step of the infection incidence with a delay distribution, and there is additional variation due to e.g. weekday effects. It is thus not clear a priory which transformation works best for our data, and we therefore investigated various possible transformations (including the square root transformation). We found that no transformation was uniformly the best for data of different countries, but that the log-transformation tended to perform best overall. This is why we chose the log-transformation. Please see the new supplementary Figure S14, where we show the residuals after the square root transformation and the log transformations for various countries.

    Regarding the bootstrap confidence intervals, we also investigated different options. Again it is not clear a priory which bootstrap confidence interval performs best for our data, so we compared common choices like quantile, reversed quantile and normal-based in a simulation study. Specifically, we assessed their coverage and found that the normal-based confidence intervals performed best overall (see Fig. S4).

    For low incidence settings, none of the bootstrap methods perform very well (as bootstrap consistency does not apply). We now mention this consideration in the paper (line 442).

    Finally, regarding the suggestion to compute exp(SD(log(X)): This quantity is generally different from SD(X), which we need for the confidence intervals. We also refer to the coverage in the various supplementary figures (e.g. S2, S4, S5) to support that our approach works well.

    1. The stringency index is a convenient metric for intervention intensity. However, it doesn't reflect actual compliance as the authors admitted. Another likely more pertinent metric is human movement (could be multiple movement indices). Human movement indices may not be available in all countries, but they are available in some, e.g., the US, and first wave in China. In some states of the US, it was clear that human movement decreased substantially even before initiation of lockdown. Lack of human movement metrics most likely has contributed to the difficulty in the interpretation of Figure 4.

    We have added mobility data (from Apple and Google location data) to our general dashboard, and to the analysis shown in Fig. 5. The mobility traces give more detailed insight in the behavior that may have led to decreases in Re. However, we find similar patterns wrt. decreases in Re as with the stringency index. A more extensive analysis that focuses on different phases of the pandemic may allow for more detailed insights, but we believe this is beyond the scope of our manuscript.

    Was this evaluation helpful?
  2. Evaluation Summary:

    This paper presents an integrated algorithm, based on several standard techniques in infectious disease epidemiology, to estimate the real-time reproductive number and show how it has evolved in different countries during COVID-19. However, the analyses should be modelled in a more integrated fashion. Uncertainty estimation requires more work. And additional data streams should be incorporated to more reliably capture infection dynamics.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

    Was this evaluation helpful?
  3. Reviewer #2 (Public Review):

    This manuscript describes an algorithm of estimating real time effective reproductive number R_e (t). This algorithm combines several methods in a reasonable way: deconvolution of time series of reported case into time series of infection, a Poisson model for generation of infections, and block-bootstrap of residuals to assess uncertainty. Each component is not necessarily novel, but the performance of this algorithm has been validated using comprehensive simulation studies. The algorithm was applied to COVID-19 surveillance data in selected countries across continents, revealing a great deal of heterogeneity in the association of R_e (t) with nonpharmaceutical interventions. Overall, the conclusions seem reliable.

    I have several moderate critiques and suggestions:

    1. For a statistical point of view, it seems much more natural to integrate the infection generation process and the delay from infection to reporting, possibly with reporting errors, into the same model, with which you will avoid combing the bootstrap and the credible intervals in a somewhat awkward way. I understand you can take advantage of EpiEstim package, but the likelihood is very simple and easy to program up. Nevertheless, I'm not strongly against the current paradigm.

    2. Is there a strong reason to believe the residuals are autocorrelated? The block sampling with block size 10 seems arbitrary. The authors fitted a ARIMA model to the residuals for some countries, how good was the fitting? If the block size doesn't matter, then probably the stronger but simpler assumption of independent residuals may not compromise much the estimation of R_e (t).

    3. I don't see the necessity of using segmented R_e (t) instead of a smooth curve in the simulation studies. The inferential performance, especially the coverage of the CI's, is much less satisfactory when a segment has a steep slope. The authors may consider constructing splines based on the segments or using basis functions directly.

    4. The authors smoothed the log-transformed observed incidences to come up with the residuals. For Poisson data, a variance-stabilized transformation is taking the square root, not the logarithm. In addition, as you already have bootstrap estimates, why not using quantiles directly for CIs but instead using a normal approximation (asymptotic)? When incidence is low, the normal approximation may be much less satisfactory. Also, when using normal approximation for CI, it's much safer to calculate standard deviation and construct CI at the log-scale, i.e., log(θ ̂^*(t)), and then exponentiate back.

    5. The stringency index is a convenient metric for intervention intensity. However, it doesn't reflect actual compliance as the authors admitted. Another likely more pertinent metric is human movement (could be multiple movement indices). Human movement indices may not be available in all countries, but they are available in some, e.g., the US, and first wave in China. In some states of the US, it was clear that human movement decreased substantially even before initiation of lockdown. Lack of human movement metrics most likely has contributed to the difficulty in the interpretation of Figure 4.

    Was this evaluation helpful?
  4. Reviewer #1 (Public Review):

    This paper presents an approach to estimate Rt for 170 countries. While it is an impressive amount of work, I think the pipeline is similar to many currently available frameworks. The paper claims the following novelties over current framework, but more efforts are needed to be done to make it convincing.

    1. Obtain stable estimate from multiple types of data:
      It turns out the stable estimates are just repeatedly use the same approaches to different time series (Figure 3A middle). From the wording I think there should be some methods to combine these time series to have a single estimate of Rt. Overall, the Rt and the time series of infection should be unique. It would be suboptimal, for example if there are big difference in the results from death time series and reported cases time series, which one should I trust?

    2. Adequate representation of uncertainty:
      This is the result in Figure 2, suggesting the CI from EpiEstim is too narrow. This would be expected given that EpiEstim assumed the input infection time series is observed and fixed. It would be expected that the proposed approach would provide wider CI and hence the proportion covered would be more. However, I think to validate the wider CI is the correct one, simulation studies are required. I think the most related one would be Figure 1B. The results suggested that the approach works when the Rt is not rapid changing. However, I have concern on the methods for simulation (details below).

    3. Real-time of the Rt
      There is no simulation about the real-time property of the Rt. The most related one is still Figure 1B. However, looks at the right-tail of the figure (the real-time performance), the proportion covered the true value is decreasing and more efforts are needed to support the framework can be accurate in real-time. For example, how is the real-time performance when Rt is increasing, or Rt decrease sharply due to lock down?

    4. simulation methods to estimate Rt
      Both 2) and 3) needs simulation to support the results, and hence the simulation approach would be critical. The first part based on Poisson distribution to generate an infection time series, which is OK. However, the issue is the secondary part about how the authors obtained the time series for death/hospitalization/reported cases. To me, after generating the infection time series, based on the delay distribution from infection to death/hospitalization/reported, we could obtain those time series. I am not clear and sure if the authors approach is correct by using smoothing and fitting ARIMA to get those time series.

    Minor comments:

    1. What is near real-time mean? The estimates of Rt are delay for a few days like other approach?
    2. For the results in Table 1, I think if there are some results suggesting that other approaches (like EpiEstim) perform worse than the proposed approach, it would be better to illustrate the value of the proposed approach.
    3. I think more discussions are needed for the similarity and differences for current approach. For example, Abbott et al (https://wellcomeopenresearch.org/articles/5-112) used a similar pipeline.
    4. Figure S11 is about accounting for known imports. While if the local cases are dominate and hence imported cases would not have a big impact on estimates of Rt. The impact of imported cases on estimates of Rt could be complicated, as suggested that in Tsang et al. (https://pubmed.ncbi.nlm.nih.gov/34086944/). In addition to assume imported cases and 'exported' cases could be canceled, it is also assumed that the imported cases had similar transmissibility to the local cases, which may not be true if there is border control.
    Was this evaluation helpful?
  5. SciScore for 10.1101/2020.11.26.20239368: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    4.1 EpiEstim: The method presented here builds upon the Re estimation method developed by Cori et al. [12], implemented in the EpiEstim R package.
    EpiEstim
    suggested: (EpiEstim, RRID:SCR_018538)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Because of this broad interest and the critical importance of Re estimates, it is crucial to communicate both the results as well as the associated uncertainty and caveats in an open, transparent and accessible way. This is why we display daily updated results on an online dashboard, accessible at https://ibz-shiny.ethz.ch/covid-19-re/. The dashboard shows Re estimates in the form of time series for each included country or region, and a global map containing the latest Re estimates and normalised incidence. For a number of European countries, we also display a timeline of the implementation and lifting of non-pharmaceutical interventions (NPIs). For all countries, we display a timeline of the stringency index of the Blavatnik School of Government. A unique advantage of the monitoring method we have developed is the parallel use of different types of observation data, all reflecting the same underlying infection process [6]. Wherever we have data of sufficient quality, we estimate Re separately based on confirmed cases, hospitalisations and death reports. The advantages and disadvantages of the different observation types are discussed in the Supplementary discussion 6.1. Comparing estimates from several types of data is a powerful way to evaluate the sensitivity of the results to the type of observations they were derived from. More generally, the method would be applicable to any other type of incidence data, such as admissions to intensive care units or excess death data. ...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

    Was this evaluation helpful?