Using multiple sampling strategies to estimate SARS-CoV-2 epidemiological parameters from genomic sequencing data
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
The choice of viral sequences used in genetic and epidemiological analysis is important as it can induce biases that detract from the value of these rich datasets. This raises questions about how a set of sequences should be chosen for analysis. We provide insights on these largely understudied problems using SARS-CoV-2 genomic sequences from Hong Kong, China, and the Amazonas State, Brazil. We consider multiple sampling schemes which were used to estimate R t and r t as well as related R 0 and date of origin parameters. We find that both R t and r t are sensitive to changes in sampling whilst R 0 and the date of origin are relatively robust. Moreover, we find that analysis using unsampled datasets result in the most biased R t and r t estimates for both our Hong Kong and Amazonas case studies. We highlight that sampling strategy choices may be an influential yet neglected component of sequencing analysis pipelines.
Article activity feed
-
-
-
SciScore for 10.1101/2022.02.04.22270165: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Using the Accession ID of each sequence, all sequences were screened and only sequences previously analysed and published in PubMed, MedRxiv, BioRxiv, virological or Preprint repositories were selected for subsequent analysis. PubMedsuggested: (PubMed, RRID:SCR_004846)BioRxivsuggested: (bioRxiv, RRID:SCR_003933)The gradient of the slopes (clock rates) provided by TempEst were used to inform the clock prior in the phylodynamic analysis. TempEstsuggested: (TempEst, RRID:SCR_017304)Bayesian Evolutionary Analysis: Date molecular clock phylogenies were inferred for all sampling strategies … SciScore for 10.1101/2022.02.04.22270165: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Using the Accession ID of each sequence, all sequences were screened and only sequences previously analysed and published in PubMed, MedRxiv, BioRxiv, virological or Preprint repositories were selected for subsequent analysis. PubMedsuggested: (PubMed, RRID:SCR_004846)BioRxivsuggested: (bioRxiv, RRID:SCR_003933)The gradient of the slopes (clock rates) provided by TempEst were used to inform the clock prior in the phylodynamic analysis. TempEstsuggested: (TempEst, RRID:SCR_017304)Bayesian Evolutionary Analysis: Date molecular clock phylogenies were inferred for all sampling strategies applied to the Amazonas and Hong Kong dataset using BEAST v1.10.4 (Suchard et al., 2018) with BEAGLE library v3.1.0 (Ayres et al., 2019) for accelerated likelihood evaluation. BEAGLEsuggested: (BEAGLE, RRID:SCR_001789)Subsequently, 10% of all trees were discarded as burn in, and the effective sample size of parameter estimates were evaluated using TRACER v1.7.2 (Rambaut et al., 2018). TRACERsuggested: (Tracer, RRID:SCR_019121)Phylodynamic Reconstruction: Estimation of the Reproduction Number and Time-varying Effective Reproduction Number The Bayesian birth-death skyline (BDSKY) model (Stadler et al., 2013) implemented within BEAST 2 v2.6.5 (Bouckaert et al., 2019) was used to estimate time-varying rates of epidemic transmission, measured as changes in Rt (Table 2). BEASTsuggested: (BEAST, RRID:SCR_010228)The four independent MCMC runs were combined using LogCombiner v2.6.5. (Bouckaert et al., 2019) and the effective sample size of parameter estimates were evaluated using TRACER v1.7.2 (Rambaut et al., 2018). LogCombinersuggested: (BEAST2, RRID:SCR_017307)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:While our results provide a rigorous underpinning and insight into the dynamics of SARS-CoV-2 and the impact of sampling strategies in the Amazonas region and Hong Kong, there are limitations. The Skygrowth and BDSKY models do not explicitly consider imports into their respective regions. This is particularly relevant for Hong Kong as most initial sequences from the region were sequenced from importation events (Adam et al., 2020) which can introduce error into parameter estimation. However, as the epidemic expanded, more infections were attributable to autochthonous transmission (Adam et al., 2020), and the risk of error introduced by importation events decreased. Moreover, while sampling strategies can account for temporal variations in genomic sampling fractions there is currently no way to account for non-random sampling approaches in either the BDSKY or Skygrowth models (Vasylyeva et al., 2020). It is unclear how network-based sampling may affect parameter estimates obtained through these models (Volz, Koelle and Bedford, 2013) presenting a key challenge in molecular and genetic epidemiology. Spatial heterogeneities were also not explored within this work. This represents the next key step in understanding the impact of sampling as spatial sampling schemes would allow the reconstruction of the dispersal dynamics and estimation of epidemic overdispersion (k), a key epidemiological parameter. This work has highlighted the impact and importance that applying temporal sampli...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-
