Sequencing identifies multiple early introductions of SARS-CoV-2 to the New York City region
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
Effective public response to a pandemic relies upon accurate measurement of the extent and dynamics of an outbreak. Viral genome sequencing has emerged as a powerful approach to link seemingly unrelated cases, and large-scale sequencing surveillance can inform on critical epidemiological parameters. Here, we report the analysis of 864 SARS-CoV-2 sequences from cases in the New York City metropolitan area during the COVID-19 outbreak in spring 2020. The majority of cases had no recent travel history or known exposure, and genetically linked cases were spread throughout the region. Comparison to global viral sequences showed that early transmission was most linked to cases from Europe. Our data are consistent with numerous seeds from multiple sources and a prolonged period of unrecognized community spreading. This work highlights the complementary role of genomic surveillance in addition to traditional epidemiological indicators.
Article activity feed
-
-
SciScore for 10.1101/2020.04.15.20064931: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Libraries presumed more suitable for capture (generally, qPCR Ct value > 30) were enriched for SARS-CoV-2 genomic sequences using custom biotinylated DNA probe pools either from Twist Biosciences or Integrated DNA Technologies: In general, we pooled samples with similar Ct values and accounted for variations in parent library concentration, multiplexing up to 23 libraries per reaction. Twist Biosciencessuggested: NoneSequenced read processing: Reads were demultiplexed with Illumina bcl2fastq v2.20 requiring a perfect match to indexing barcode sequences. bcl2fastqsuggested: (bcl2fastq , …SciScore for 10.1101/2020.04.15.20064931: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Libraries presumed more suitable for capture (generally, qPCR Ct value > 30) were enriched for SARS-CoV-2 genomic sequences using custom biotinylated DNA probe pools either from Twist Biosciences or Integrated DNA Technologies: In general, we pooled samples with similar Ct values and accounted for variations in parent library concentration, multiplexing up to 23 libraries per reaction. Twist Biosciencessuggested: NoneSequenced read processing: Reads were demultiplexed with Illumina bcl2fastq v2.20 requiring a perfect match to indexing barcode sequences. bcl2fastqsuggested: (bcl2fastq , RRID:SCR_015058)Illumina sequencing adapters were trimmed with Trimmomatic v0.39 (Bolger et al. 2014). Trimmomaticsuggested: (Trimmomatic, RRID:SCR_011848)Reads were aligned using BWA v0.7.17 (Li and Durbin 2009) to a custom index containing human genome reference (GRCh38/hg38) including unscaffolded contigs and alternate references plus the reference SARS-CoV-2 genome (NC_045512.2, wuhCor1). BWAsuggested: (BWA, RRID:SCR_010910)Presumed PCR duplicates were marked using samblaster v0.1.24 (Faust and Hall 2014). samblastersuggested: (SAMBLASTER, RRID:SCR_000468)Variants were called across all samples using bcftools v1.9: Raw pileups were filtered using Viral sequences were generated by applying VCF files to the reference sequence using ‘bcftools consensus’ with -m to mask sites below 20x with Ns, and -m N to mask sites of ambiguous genotypes with N. Geoplotting: The regional case heat map was generated using R v3.6.2 using the packages ggplot2 v3.3.0 for plotting, and sf v0.8 for geospatial data manipulation. ggplot2suggested: (ggplot2, RRID:SCR_014601)Sequences were aligned along with the reference genome using MAFFT v7.453 (Katoh and Standley 2013), and the resulting alignment was masked to remove 100 bp from the beginning, 50 from the end, and uninformative point mutations (positions 11083, 13402, 21575, 24389, 24390). MAFFTsuggested: (MAFFT, RRID:SCR_011811)A maximum likelihood tree was estimated using IQ-TREE 1.6.1 using a HKY substitution model (Nguyen et al. 2015). IQ-TREEsuggested: (IQ-TREE, RRID:SCR_017254)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
