A statewide analysis of SARS-CoV-2 transmission in New York

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

New York State, in particular the New York City metropolitan area, was the early epicenter of the SARS-CoV-2 pandemic in the United States. Similar to initial pandemic dynamics in many metropolitan areas, multiple introductions from various locations appear to have contributed to the swell of positive cases. However, representation and analysis of samples from New York regions outside the greater New York City area were lacking, as were SARS-CoV-2 genomes from the earliest cases associated with the Westchester County outbreak, which represents the first outbreak recorded in New York State. The Wadsworth Center, the public health laboratory of New York State, sought to characterize the transmission dynamics of SARS-CoV-2 across the entire state of New York from March to September with the addition of over 600 genomes from under-sampled and previously unsampled New York counties and to more fully understand the breadth of the initial outbreak in Westchester County. Additional sequencing confirmed the dominance of B.1 and descendant lineages (collectively referred to as B.1.X) in New York State. Community structure, phylogenetic, and phylogeographic analyses suggested that the Westchester outbreak was associated with continued transmission of the virus throughout the state, even after travel restrictions and the on-pause measures of March, contributing to a substantial proportion of the B.1 transmission clusters as of September 30 th , 2020.

Article activity feed

  1. SciScore for 10.1101/2021.02.20.21251598: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Briefly, reads were trimmed with TrimGalore (https://github.com/FelixKrueger/TrimGalore) and aligned to the reference assembly MN908947.3 (strain Wuhan-Hu-1) by BWA (Li & Durbin 2010).
    TrimGalore
    suggested: None
    BWA
    suggested: (BWA, RRID:SCR_010910)
    Primers were trimmed with iVar (Grubaugh et al. 2018) and variants were called with samtools mpileup function (Li et al. 2009), the output of which was used by iVar to generate consensus sequences.
    samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    Variants were annotated by the Coronapp web tool (http://giorgilab.dyndns.org/coronapp/) (Mercatelli et al. 2020) and clade defining SNPs were identified by a custom Python script.
    Python
    suggested: (IPython, RRID:SCR_001658)
    Trees and annotations were visualized in FigTree v.
    FigTree
    suggested: (FigTree, RRID:SCR_008515)
    Additionally, problematic positions 13402, 24380, and 24390 were masked before aligning with MAFFT v.7.470.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Maximum likelihood phylogenetic reconstruction was performed in IQ-TREE v.
    IQ-TREE
    suggested: (IQ-TREE, RRID:SCR_017254)

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Thus, despite sampling limitations, some regions appear to be characterized by SARS-CoV-2 lineages that were less representative of the entire state, implicating localized viral transmission as a greater influence on SARS-CoV-2 composition than inter-regional spread. Overall, the diversity of observed lineages decreased with time since the beginning of the New York epidemic in March (Figure 2). This is particularly apparent by June-September, where sequence diversity is solely concentrated in the B.1.X portion of the tree (Figure 2). This does not take into account the more limited sequencing that occurred during later months, which might hinder the ability to detect variants circulating at low frequency. In fact, three regions with the greatest number of lineages in March (New York City, Mid-Hudson, and Long Island) had no data available for various later months (as of the time of writing). The number of lineages detected per month was significantly correlated with the number of samples sequenced (Spearman rank correlation coefficient r=0.74, p-value=3.9×10−08) and randomly down-sampling earlier months (March and April) to the size of May and June datasets revealed significant decreases in the number of lineages identified (one-sample t-test p-values < 2.2×10−16). However, rarefaction analyses demonstrated that while lineages continued to be identified as more regions were sampled for March and April, lineage richness remained essentially flat or unchanged for May and June-S...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.