Molecular Architecture of Early Dissemination and Evolution of the SARS-CoV-2 Virus in Metropolitan Houston, Texas

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

We sequenced the genomes of 320 SARS-CoV-2 strains from COVID-19 patients in metropolitan Houston, Texas, an ethnically diverse region with seven million residents. These genomes were from the viruses causing infections in the earliest recognized phase of the pandemic affecting Houston. Substantial viral genomic diversity was identified, which we interpret to mean that the virus was introduced into Houston many times independently by individuals who had traveled from different parts of the country and the world. The majority of viruses are apparent progeny of strains derived from Europe and Asia. We found no significant evidence of more virulent viral types, stressing the linkage between severe disease, underlying medical conditions, and perhaps host genetics. We discovered a signal of selection acting on the spike protein, the primary target of massive vaccine efforts worldwide. The data provide a critical resource for assessing virus evolution, the origin of new outbreaks, and the effect of host immune response.

Significance

COVID-19, the disease caused by the SARS-CoV-2 virus, is a global pandemic. To better understand the first phase of virus spread in metropolitan Houston, Texas, we sequenced the genomes of 320 SARS-CoV-2 strains recovered from COVID-19 patients early in the Houston viral arc. We identified no evidence that a particular strain or its progeny causes more severe disease, underscoring the connection between severe disease, underlying health conditions, and host genetics. Some amino acid replacements in the spike protein suggest positive immune selection is at work in shaping variation in this protein. Our analysis traces the early molecular architecture of SARS-CoV-2 in Houston, and will help us to understand the origin and trajectory of future infection spikes.

Article activity feed

  1. SciScore for 10.1101/2020.05.01.072652: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Nucleotide sequence alignments for the combined Houston and GISAID strains were generated using MAFFT version 7.130b with default parameters (32).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Sequences were manually curated in JalView (33) to trim the ends and to remove sequences containing spurious inserts.
    JalView
    suggested: (Jalview, RRID:SCR_006459)
    Phylogenetic trees were generated using FastTree with the generalized time-reversible model for nucleotide sequences (34).
    FastTree
    suggested: (FastTree, RRID:SCR_015501)
    Analysis of the nsp12 polymerase and S protein genes: The nsp12 viral polymerase and S protein genes were analyzed by plotting SNP density in the consensus alignment using Python (Python v3.4.3, Biopython Package v1.72).
    Python
    suggested: (IPython, RRID:SCR_001658)
    Biopython
    suggested: (Biopython, RRID:SCR_007173)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.