Inferring variant-specific effective reproduction numbers from combined case and sequencing data

Marlin D Figgins
Trevor Bedford

Curated by eLife

eLife Assessment

This study provides new important insights concerning pathogen variant-specific reproduction parameters from molecular sequencing and case finding. The methods for inferring which variants will likely emerge in subsequent epidemic cycles are solid. This article is of broad interest to infectious disease epidemiology researchers and mathematical modellers of the COVID-19 pandemic.

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (eLife)
Evaluated articles (ScreenIT)

Abstract

Accurately estimating relative transmission rates of SARS-CoV-2 variants remains a scientific and public health priority. Recent studies have used the sample proportions of different variants from genetic sequence data to describe variant frequency dynamics and relative transmission rates, but frequencies alone cannot capture the rich epidemiological behavior of SARS-CoV-2. Here, we extend methods for inferring the effective reproduction number of an epidemic using confirmed case data to jointly estimate variant-specific effective reproduction numbers and frequencies of co-circulating variants using cases and sequences across states in the US from January 2021 to March 2022. Our method can be used to infer structured relationships between effective reproduction numbers across time series allowing us to estimate fixed variant-specific growth advantages. We use this model to estimate the effective reproduction number of SARS-CoV-2 Variants of Concern and Variants of Interest in the United States and estimate consistent growth advantages of particular variants across different locations.

Version published to 10.7554/elife.104802.1 on eLife
Sep 3, 2025
Version published to 10.7554/elife.104802 on eLife
Sep 3, 2025
eLife
Sep 2, 2025

eLife Assessment

This study provides new important insights concerning pathogen variant-specific reproduction parameters from molecular sequencing and case finding. The methods for inferring which variants will likely emerge in subsequent epidemic cycles are solid. This article is of broad interest to infectious disease epidemiology researchers and mathematical modellers of the COVID-19 pandemic.

Read the original source
eLife
Sep 2, 2025

Reviewer #1 (Public review):

In this manuscript, the authors describe a new method to more accurately estimate the fitness advantage of new SARS-CoV-2 variants when they emerge. This was a key public health question during the pandemic and drove a number of important policy choices during the latter half of the acute phase of the pandemic. They attempt to link fitness to expected wave size. The analyses are tested on data from 33 different US states for which the data were considered sufficient. The main novelty of the method is that it links the frequency of variants to the number of cases and thus estimates fitness in terms of the reproduction number.

The results with the new method appear to be more consistent estimates of fitness advantage over time, suggesting that the methods suggested are more accurate than the comparator methods.

Reviewer #1 (Public review):

In this manuscript, the authors describe a new method to more accurately estimate the fitness advantage of new SARS-CoV-2 variants when they emerge. This was a key public health question during the pandemic and drove a number of important policy choices during the latter half of the acute phase of the pandemic. They attempt to link fitness to expected wave size. The analyses are tested on data from 33 different US states for which the data were considered sufficient. The main novelty of the method is that it links the frequency of variants to the number of cases and thus estimates fitness in terms of the reproduction number.

The results with the new method appear to be more consistent estimates of fitness advantage over time, suggesting that the methods suggested are more accurate than the comparator methods.

Given that the paper presents a methodological advancement, the absence of a simulation study is a weakness. I am satisfied that the trends estimated via the different approaches suggest a useful advancement for a difficult problem. However, the work would have been considerably stronger if synthetic data had been used to illustrate without doubt how the revised method better captures underlying, pre-specified differences in fitness.

Read the original source
eLife
Sep 2, 2025

Reviewer #2 (Public review):

Summary:

This study develops a joint epidemiological and population genetic model to infer variant-specific effective reproduction numbers Rt and growth advantages of SARS-CoV-2 variants using US case counts and sequence data (Jan 2021-Mar 2022). For this, they use the commonly used renewal equation framework, observation models (negative binomial with zero inflation and Dirichlet-multinomial likelihoods, both to account for overdispersion). For the parameterization of Rt, again, they used a classic cubic spline basis expansion. Additionally, they use Bayesian Inference, specifically SVI. I was reassured to see the sensitivity analysis on the generation time to check effects on Rt.

This is an incredibly robust study design. Integrating case and sequence data enables estimation of both absolute and relative …

Reviewer #2 (Public review):

Summary:

This study develops a joint epidemiological and population genetic model to infer variant-specific effective reproduction numbers Rt and growth advantages of SARS-CoV-2 variants using US case counts and sequence data (Jan 2021-Mar 2022). For this, they use the commonly used renewal equation framework, observation models (negative binomial with zero inflation and Dirichlet-multinomial likelihoods, both to account for overdispersion). For the parameterization of Rt, again, they used a classic cubic spline basis expansion. Additionally, they use Bayesian Inference, specifically SVI. I was reassured to see the sensitivity analysis on the generation time to check effects on Rt.

This is an incredibly robust study design. Integrating case and sequence data enables estimation of both absolute and relative variant fitness, overcoming limitations of frequency-only or case-only models. This reminds me of https://www.medrxiv.org/content/10.1101/2023.01.02.23284123v4.full

I also really appreciated the flexible and interpretable parameterization of the renewal equations with splines. But I may be biased since I really like splines!

The approach is justified, however, it has some big limitations. Specifically, there are some notable weaknesses, that I detail below.

(1) The model does not account for demographic stochasticity or transmission overdispersion (superspreading), which are known to affect SARS-CoV-2 dynamics and can bias Rt, especially in low incidence or early introduction phases.

(2) While the authors explore the sensitivity of generation time, the reliance on fixed generation time parameters (with some adjustments for Delta/Omicron) may still bias results

(3) There is no explicit adjustment for population immunity, which limits the ability to disentangle intrinsic variant fitness (even though the model allows for inclusion of covariates - this to me is one of two major flaws in the study.

(4) The second major flaw in my opinion is that there is no hierarchical pooling across states - each state is modeled independently. A hierarchical Bayesian model could borrow strength across states, improving estimates for states with sparse data and enabling more robust inference of shared variant effects.

I would strongly recommend the following things in order of priority, where the first two points I consider critical.

(1) Implement a hierarchical model for variant growth advantages and Rt across states.

(2) Include time-varying covariates for vaccination rates, prior infection, and non-pharmaceutical interventions directly. This would help disentangle intrinsic variant transmissibility from changes in population susceptibility and behavior.

(3) Extend the renewal model to a stochastic or branching process framework that explicitly models overdispersed transmission.

(4) It would be good to allow for multiple seeding events per variant and per state. This can be informed by phylogeography in a minimum effort way and would improve the accuracy of Rt.

(5) By now, I don't think it will be a surprise that addressing sampling bias is standard, reweighting sequence data or comparing results with independent surveillance data to assess the impact of non-representative sequencing.

Read the original source

SciScore for 10.1101/2021.12.09.21267544: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Inference: The model is implemented in NumPyro [26] in Python and approximate Bayesian inference was conducted using Stochastic Variational Inference [27] using the ADAM optimizer [28] with a learning rate of 0.01.	NumPyro suggested: None Python suggested: (IPython, RRID:SCR_001658)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

With this mind, this work is not without limitations. The underlying transmission model is deterministic and does not account for demographic …

SciScore for 10.1101/2021.12.09.21267544: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Inference: The model is implemented in NumPyro [26] in Python and approximate Bayesian inference was conducted using Stochastic Variational Inference [27] using the ADAM optimizer [28] with a learning rate of 0.01.	NumPyro suggested: None Python suggested: (IPython, RRID:SCR_001658)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

With this mind, this work is not without limitations. The underlying transmission model is deterministic and does not account for demographic stochasticity and over-dispersion in transmission which has been documented in SARS-CoV-2 transmission [17]. As with all methods which depend on parameterizations of the generation time, misspecification of the generation time can be lead to biased estimates of the effective reproduction number or growth advantages [18]. In order to quantify this source of error, we derive an equation relating our inferred growth advantages, the epidemic growth rates, and the mean and standard deviation of the generation time distribution. This source of error can be partially combatted by converting effective reproduction numbers to their corresponding epidemic growth rates under the generation time assumption. (see Supplement Appendix) There is also a general need to account for biases in the case data which may not faithfully describe the infection dynamics of SARS-CoV-2 due to changes in case ascertainment rate, as possibly caused by differences in testing intensity, infection severity among other reasons. However, we suspect that case ascertainment remained largely consistent from January to October 2021. We do not explicitly model multiple introductions of variants which can play an important role in variants establishing themselves in different geographies at low infection counts and could bias our estimates of the effective reproduction number i...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Version published to 10.1101/2021.12.09.21267544 on medRxiv
Dec 11, 2021

Enhancing Time-Varying Reproduction Number Estimates with Behavior and Surveillance Data

This article has 5 authors:
1. Byul Nim Kim
2. Suhyeon Kim
3. Haram Seo
4. Gerardo Chowell
5. Sunmi Lee
This article has no evaluationsLatest version Dec 10, 2025
Genetic estimates of relatedness: Established practices and new opportunities through low coverage whole genome sequencing

This article has 8 authors:
1. Annika Freudiger
2. Natalie Kestel
3. Vladimir Jovanovic
4. Mariana Madruga de Brito
5. Angelina Ruiz-Lambides
6. Katja Nowick
7. Anja Widdig
8. Harald Ringbauer
This article has no evaluationsLatest version Jan 23, 2026
Estimating the effect of self-protection on transmission dynamics of SARS-CoV-2 in Germany in 2021: A modelling study

This article has 3 authors:
1. Marvin Schulte
2. Neele Leithäuser
3. Jan Mohring
This article has no evaluationsLatest version Jan 20, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Enhancing Time-Varying Reproduction Number Estimates with Behavior and Surveillance Data

Genetic estimates of relatedness: Established practices and new opportunities through low coverage whole genome sequencing

Estimating the effect of self-protection on transmission dynamics of SARS-CoV-2 in Germany in 2021: A modelling study