Genomic surveillance reveals multiple introductions of SARS-CoV-2 into Northern California

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Genome sequencing of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) outbreaks is valuable for tracing the sources and perhaps for drawing lessons about preventing future outbreaks. Genomic analysis by Deng et al. revealed that Northern California experienced a complex series of introductions of the virus, deriving not only from state-to-state transmission but also from international travel by air and ship. The study highlights the importance of being able to rapidly test and trace contacts of positive cases to enable swift control.

Science , this issue p. 582

Article activity feed

  1. SciScore for 10.1101/2020.03.27.20044925: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board StatementIRB: Ethics Statement: Clinical specimens were processed at the University of California San Francisco (UCSF) under protocols approved by the UCSF Institutional Review Board (protocol no. 10-01116, 11-05519).
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Sequencing libraries were then sequenced on MiSeq, Nextseq, or HiSeq 1500 (Illumina Inc., San Diego, USA) as 1×150 single-end or 2×150 paired-end reads.
    MiSeq
    suggested: (A5-miseq, RRID:SCR_012148)
    Genome assembly and consensus generation: Raw reads were first screened via BLASTn (BLAST+ package 2.9.0) (23) for alignment to reference strain NC_045512.
    BLASTn
    suggested: (BLASTN, RRID:SCR_001598)
    BLAST+
    suggested: (Japan Bioinformatics, RRID:SCR_012250)
    They were then aligned to the reference with LASTZ version 1.04.03.
    LASTZ
    suggested: (LASTZ, RRID:SCR_018556)
    SARS-CoV-2 reads were trimmed using Geneious version 11.1.3 by removal of 13 nucleotides (nt) (the length of the MSSPE primer) and low-quality reads from the ends, followed by removal of duplicate reads.
    Geneious
    suggested: (Geneious, RRID:SCR_010519)
    Phylogenetic analysis and genomic comparison: Sequences were aligned using MAFFT v7.427 (24) under default settings and multiple sequence alignments were manually corrected.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Phylogenetic trees were constructed in PhyML v3.3 (25) under an HKY+Γ4 substitution model (26, 27).
    PhyML
    suggested: (PhyML, RRID:SCR_014629)
    The location of SNPs and gaps were extracted from alignments using a custom Python script and visualized using a custom R-script.
    Python
    suggested: (IPython, RRID:SCR_001658)
    Submission of the genomes and raw sequence data to NIH GenBank and Sequence Read Archive (SRA) is pending.
    Sequence Read Archive
    suggested: (DDBJ Sequence Read Archive, RRID:SCR_001370)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Our epidemiological and genomic survey of SARS-CoV-2 has several limitations. First, this initial analysis represents relatively sparse sampling of cases. Undersampling of virus genomes is due in part to the high proportion of cases (80%) with asymptomatic or mild disease (3, 4, 20) and limited diagnostic testing for COVID-19 infection to date in California and throughout the United States. Second, the majority of samples analyzed were obtained from public health laboratories and thus may not be representative of the general population. Finally, phylogenetic clustering of viruses from different locations, such as Washington State and California in the same clade, does not prove directionality of spread. Despite this, our study shows that more robust insights into COVID-19 transmission are achievable if virus genomic diversity is combined and jointly interpreted with detailed epidemiological case data. Public health containment measures such as prompt isolation and contact tracing, as performed in the Solano County and Santa Clara County clusters, become more difficult to maintain once a lineage becomes established in the community. Our data suggest concerning trends in this direction, such as the association between the WA1 lineage and community-acquired COVID-19 cases in several counties of Northern California, travel-associated introduction of SARS-CoV-2 into the San Francisco Bay Area from New York state, and a virus from the lineage associated with a Santa Clara County cl...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.