Limited genomic reconstruction of SARS-CoV-2 transmission history within local epidemiological clusters
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
A detailed understanding of how and when severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission occurs is crucial for designing effective prevention measures. Other than contact tracing, genome sequencing provides information to help infer who infected whom. However, the effectiveness of the genomic approach in this context depends on both (high enough) mutation and (low enough) transmission rates. Today, the level of resolution that we can obtain when describing SARS-CoV-2 outbreaks using just genomic information alone remains unclear. In order to answer this question, we sequenced forty-nine SARS-CoV-2 patient samples from ten local clusters in NW Spain for which partial epidemiological information was available and inferred transmission history using genomic variants. Importantly, we obtained high-quality genomic data, sequencing each sample twice and using unique barcodes to exclude cross-sample contamination. Phylogenetic and cluster analyses showed that consensus genomes were generally sufficient to discriminate among independent transmission clusters. However, levels of intrahost variation were low, which prevented in most cases the unambiguous identification of direct transmission events. After filtering out recurrent variants across clusters, the genomic data were generally compatible with the epidemiological information but did not support specific transmission events over possible alternatives. We estimated the effective transmission bottleneck size to be one to two viral particles for sample pairs whose donor–recipient relationship was likely. Our analyses suggest that intrahost genomic variation in SARS-CoV-2 might be generally limited and that homoplasy and recurrent errors complicate identifying shared intrahost variants. Reliable reconstruction of direct SARS-CoV-2 transmission based solely on genomic data seems hindered by a slow mutation rate, potential convergent events, and technical artifacts. Detailed contact tracing seems essential in most cases to study SARS-CoV-2 transmission at high resolution.
Article activity feed
-
-
SciScore for 10.1101/2021.08.08.21261673: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Sample collection: According to the epidemiological records, we identified 49 patients infected with SARS-CoV-2 conforming ten independent transmission clusters originated in nursing homes, family households, and birthday parties from the same city (Figure 1; Table S1). SARS-CoV-2suggested: (BioLegend Cat# 946101, RRID:AB_2892515)Cluster J is a family in which J1 infected partner J2 and child J3. Cluster Jsuggested: NoneWe sequenced the 98 libraries in two high-output (7.5 Gb) runs (60 and 38 samples, respectively) on an Illumina MiniSeq (PE150 reads) at the sequencing facility of the … SciScore for 10.1101/2021.08.08.21261673: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Sample collection: According to the epidemiological records, we identified 49 patients infected with SARS-CoV-2 conforming ten independent transmission clusters originated in nursing homes, family households, and birthday parties from the same city (Figure 1; Table S1). SARS-CoV-2suggested: (BioLegend Cat# 946101, RRID:AB_2892515)Cluster J is a family in which J1 infected partner J2 and child J3. Cluster Jsuggested: NoneWe sequenced the 98 libraries in two high-output (7.5 Gb) runs (60 and 38 samples, respectively) on an Illumina MiniSeq (PE150 reads) at the sequencing facility of the University of Vigo. MiniSeqsuggested: NoneVariant calling and consensus sequences: We assessed the quality of the fastq files using FastQC (Andrews 2010). FastQCsuggested: (FastQC, RRID:SCR_014583)Then we aligned the reads to the reference MN908947.3 from Wuhan using BWA-mem (Li 2013) and trimmed them with iVar (Grubaugh, Gangavarapu, et al. 2019). BWA-memsuggested: (Sniffles, RRID:SCR_017619)We evaluated the quality of the aligned trimmed reads using Picard v2.21.8 (http://broadinstitute.github.io/picard). Picardsuggested: (Picard, RRID:SCR_006525)The calls obtained were confirmed with LoFreq (Wilm et al. 2012). LoFreqsuggested: (LoFreq, RRID:SCR_013054)To build a consensus sequence for each sample, we merged the reads from the two replicates with SAMtools mpileup and fed them to iVar consensus with a minimum VAF threshold of 0.5. SAMtoolssuggested: (SAMTOOLS, RRID:SCR_002105)For this, we aligned the consensus sequences with the reference using MAFFT v. MAFFTsuggested: (MAFFT, RRID:SCR_011811)7 (Katoh and Standley 2013) (mafft --maxiterate 500 ) and ran IQ-TREE (v.2.0.6) (Nguyen et al. 2015) (iqtree2 -T AUTO -s -m TEST -b 1000 -o MN908947.3) with the best-fit nucleotide substitution model and 1,000 bootstrap replicates. IQ-TREEsuggested: (IQ-TREE, RRID:SCR_017254)We estimated the dN/dS ratio for each sample using the dNdScv package (Martincorena et al. 2017), recently adapted for its application to SARS-CoV-2 (Tonkin-Hill et al. 2020). dNdScvsuggested: (dndSCV, RRID:SCR_017093)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Methods for cluster delimitation that rely exclusively on intrahost variants did not work well in this regard. In contrast, methods based on differences at the consensus level could differentiate the clusters near perfectly. The latter suggests that in SARS-CoV-2, consensus sequences alone are enough to separate samples belonging to different clusters from the same area. At the same time, intra-host diversity does not seem to be sufficient for this task. We found a limited number of intrahost variants (∼8 before filtering recurrent variants and ∼3 after filtering), as reported in other studies (Kuipers et al. 2020; Seemann et al. 2020; Shen et al. 2020; Tonkin-Hill et al. 2020; Wölfel et al. 2020; Butler et al. 2021; Valesano et al. 2021; Y. Wang et al. 2021). Half of our samples (27/48) had a viral load above 103 copies / µL, which is the threshold determined in Valesano et al. (2021) for reliable identification of intrahost variants with a VAF ≥ 2% in single replicates. Unlike previous studies, we used technical replicates to stress variant calling reproducibility and added unique barcodes to each sample to discard the potential effect of cross-sample contamination. Here, transmission history within nursing homes or households, where most SARS-CoV-2 infections occur (Lee et al. 2020), was complicated to decipher. In general, all the methods we tried, even those relying on intrahost variation, could not provide clear transmission patterns within clusters, as seen before in c...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-