Unrecognized introductions of SARS-CoV-2 into the US state of Georgia shaped the early epidemic

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

In early 2020, as diagnostic and surveillance responses for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) ramped up, attention focused primarily on returning international travelers. Here, we build on existing studies characterizing early patterns of SARS-CoV-2 spread within the USA by analyzing detailed clinical, molecular, and viral genomic data from the state of Georgia through March 2020. We find evidence for multiple early introductions into Georgia, despite relatively sparse sampling. Most sampled sequences likely stemmed from a single or small number of introductions from Asia three weeks prior to the state’s first detected infection. Our analysis of sequences from domestic travelers demonstrates widespread circulation of closely related viruses in multiple US states by the end of March 2020. Our findings indicate that the exclusive focus on identifying SARS-CoV-2 in returning international travelers early in the pandemic may have led to a failure to recognize locally circulating infections for several weeks and point toward a critical need for implementing rapid, broadly targeted surveillance efforts for future pandemics.

Article activity feed

  1. SciScore for 10.1101/2021.09.19.21262615: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Numpy v.
    Numpy
    suggested: (NumPy, RRID:SCR_008633)
    2.1.348 was used to generate maximum likelihood phylogenies with 1000 ultrafast bootstrap replicates49, collapsing small branches, and using ModelFinder to identify the best fit nucleotide substitution model50.
    ModelFinder
    suggested: None
    Root-to-tip regression was conducted using SciPy v.
    SciPy
    suggested: (SciPy, RRID:SCR_008058)
    IQ-Tree was used to generate a maximum likelihood phylogeny of these sequences with the same parameters as described above and TreeTime was used to remove any samples falling outside four interquartile ranges of the expected molecular clock rate, rooted at the best fit root as identified by least-squares regression.
    IQ-Tree
    suggested: (IQ-TREE, RRID:SCR_017254)
    Bayesian phylogenetic inference was conducted using BEAST v2.6.654 with Beagle v3.1.255 and discrete trait estimation56 implemented in BEAST_CLASSIC v.1.50.
    BEAST
    suggested: (BEAST, RRID:SCR_010228)
    Beagle
    suggested: (BEAGLE, RRID:SCR_001789)
    Downstream analysis of the TreeTime and BEAST output was conducted in Python using BioPython, Pandas, and NumPy.
    BioPython
    suggested: (Biopython, RRID:SCR_007173)
    Results were visualized using Baltic (https://github.com/evogytis/baltic), Matplotlib v.3.3.356, and Seaborn v.0.11.157.
    Matplotlib
    suggested: (MatPlotLib, RRID:SCR_008624)
    This analysis was conducted in Python using Numpy, and Pandas.
    Python
    suggested: (IPython, RRID:SCR_001658)
    4.0.4 using RStudio v.
    RStudio
    suggested: (RStudio, RRID:SCR_000432)
    1.4.1106 with GGplot2 v.
    GGplot2
    suggested: (ggplot2, RRID:SCR_014601)
    ” The variants in the mutational profile were annotated using snpEff v. 5.059.
    snpEff
    suggested: (SnpEff, RRID:SCR_005191)
    We removed non-human samples, those related to cruise ships, and samples with travel history and aligned them to Wuhan/Hu-1 with MAFFT with the same parameters described above.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Our study benefited from linked clinical and epidemiological data for nearly half of the SARS-CoV-2 samples sequenced, but despite extensive chart review, we encountered limitations e.g. in reporting specific dates of travel and symptom onset. Thus, there is a need for a dedicated infrastructure for data collection in the setting of outbreak analysis, beyond routinely collected clinical data. In addition to evaluating SARS-CoV-2 introductions, our study also provides information regarding the dynamics of early SARS-CoV-2 lineages in the U.S. The 19B subclade that caused most of the infections described in this study appears to have spread from GA both domestically and internationally (e.g. to Australia) before dying out in April/May of 2020. The apparent extinction of this D614-containing 19B subclade occurred concurrently with the widely reported sweep of SARS-CoV-2 clades harboring the 614G mutation12. The increased transmission of 614G-containing viruses may be due to their ability to cause infection with higher viral loads23,25. We did not observe a difference in either viral load or subgenomic RNA in patients with D614 or 614G-containing viruses in this study, which may be due to small sample size. While the 19B subclade reported here was associated with limited forward transmission, we did not find strong evidence for ongoing transmission from the other observed introductions of SARS-CoV-2 into GA. However, we primarily analyzed only genomes collected through the end of...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a protocol registration statement.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.