Using multiple sampling strategies to estimate SARS-CoV-2 epidemiological parameters from genomic sequencing data

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

The choice of viral sequences used in genetic and epidemiological analysis is important as it can induce biases that detract from the value of these rich datasets. This raises questions about how a set of sequences should be chosen for analysis. We provide insights on these largely understudied problems using SARS-CoV-2 genomic sequences from Hong Kong, China, and the Amazonas State, Brazil. We consider multiple sampling schemes which were used to estimate R t and r t as well as related R 0 and date of origin parameters. We find that both R t and r t are sensitive to changes in sampling whilst R 0 and the date of origin are relatively robust. Moreover, we find that analysis using unsampled datasets result in the most biased R t and r t estimates for both our Hong Kong and Amazonas case studies. We highlight that sampling strategy choices may be an influential yet neglected component of sequencing analysis pipelines.

Article activity feed

  1. SciScore for 10.1101/2022.02.04.22270165: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Using the Accession ID of each sequence, all sequences were screened and only sequences previously analysed and published in PubMed, MedRxiv, BioRxiv, virological or Preprint repositories were selected for subsequent analysis.
    PubMed
    suggested: (PubMed, RRID:SCR_004846)
    BioRxiv
    suggested: (bioRxiv, RRID:SCR_003933)
    The gradient of the slopes (clock rates) provided by TempEst were used to inform the clock prior in the phylodynamic analysis.
    TempEst
    suggested: (TempEst, RRID:SCR_017304)
    Bayesian Evolutionary Analysis: Date molecular clock phylogenies were inferred for all sampling strategies applied to the Amazonas and Hong Kong dataset using BEAST v1.10.4 (Suchard et al., 2018) with BEAGLE library v3.1.0 (Ayres et al., 2019) for accelerated likelihood evaluation.
    BEAGLE
    suggested: (BEAGLE, RRID:SCR_001789)
    Subsequently, 10% of all trees were discarded as burn in, and the effective sample size of parameter estimates were evaluated using TRACER v1.7.2 (Rambaut et al., 2018).
    TRACER
    suggested: (Tracer, RRID:SCR_019121)
    Phylodynamic Reconstruction: Estimation of the Reproduction Number and Time-varying Effective Reproduction Number The Bayesian birth-death skyline (BDSKY) model (Stadler et al., 2013) implemented within BEAST 2 v2.6.5 (Bouckaert et al., 2019) was used to estimate time-varying rates of epidemic transmission, measured as changes in Rt (Table 2).
    BEAST
    suggested: (BEAST, RRID:SCR_010228)
    The four independent MCMC runs were combined using LogCombiner v2.6.5. (Bouckaert et al., 2019) and the effective sample size of parameter estimates were evaluated using TRACER v1.7.2 (Rambaut et al., 2018).
    LogCombiner
    suggested: (BEAST2, RRID:SCR_017307)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    While our results provide a rigorous underpinning and insight into the dynamics of SARS-CoV-2 and the impact of sampling strategies in the Amazonas region and Hong Kong, there are limitations. The Skygrowth and BDSKY models do not explicitly consider imports into their respective regions. This is particularly relevant for Hong Kong as most initial sequences from the region were sequenced from importation events (Adam et al., 2020) which can introduce error into parameter estimation. However, as the epidemic expanded, more infections were attributable to autochthonous transmission (Adam et al., 2020), and the risk of error introduced by importation events decreased. Moreover, while sampling strategies can account for temporal variations in genomic sampling fractions there is currently no way to account for non-random sampling approaches in either the BDSKY or Skygrowth models (Vasylyeva et al., 2020). It is unclear how network-based sampling may affect parameter estimates obtained through these models (Volz, Koelle and Bedford, 2013) presenting a key challenge in molecular and genetic epidemiology. Spatial heterogeneities were also not explored within this work. This represents the next key step in understanding the impact of sampling as spatial sampling schemes would allow the reconstruction of the dispersal dynamics and estimation of epidemic overdispersion (k), a key epidemiological parameter. This work has highlighted the impact and importance that applying temporal sampli...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.