Molecular Architecture of Early Dissemination and Massive Second Wave of the SARS-CoV-2 Virus in a Major Metropolitan Area

This article has been Reviewed by the following groups

Read the full article

Abstract

There is concern about second and subsequent waves of COVID-19 caused by the SARS-CoV-2 coronavirus occurring in communities globally that had an initial disease wave. Metropolitan Houston, TX, with a population of 7 million, is experiencing a massive second disease wave that began in late May 2020. To understand SARS-CoV-2 molecular population genomic architecture and evolution and the relationship between virus genotypes and patient features, we sequenced the genomes of 5,085 SARS-CoV-2 strains from these two waves. Our report provides the first molecular characterization of SARS-CoV-2 strains causing two distinct COVID-19 disease waves.

Article activity feed

  1. SciScore for 10.1101/2020.09.22.20199125: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Antibodies
    SentencesResources
    The ACE2-hFc chimera was obtained from GenScript (Z03484), and the CR3022 antibody was purchased from Abcam (Ab273073)
    Z03484
    suggested: None
    CR3022
    suggested: (Imported from the IEDB Cat# CR3022, RRID:AB_2848080)
    Ab273073
    suggested: None
    Experimental Models: Organisms/Strains
    SentencesResources
    For example, the ABO column was divided into four columns for A, B, AB, and O blood type.
    AB
    suggested: RRID:BDSC_203)
    Software and Algorithms
    SentencesResources
    Nucleotide sequence alignments for the combined Houston and GISAID strains were generated using MAFFT version 7.130b with default parameters (91).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Sequences were manually curated in JalView (92) to trim the ends and to remove sequences containing spurious inserts.
    JalView
    suggested: (Jalview, RRID:SCR_006459)
    Phylogenetic trees were generated using FastTree with the generalized time-reversible model for nucleotide sequences (93).
    FastTree
    suggested: (FastTree, RRID:SCR_015501)
    Analysis of the nsp12 polymerase and S protein genes: The nsp12 virus polymerase and S protein genes were analyzed by plotting SNP density in the consensus alignment using Python (Python v3.4.3
    Python
    suggested: (IPython, RRID:SCR_001658)
    Biopython Package v1.72).
    Biopython
    suggested: (Biopython, RRID:SCR_007173)
    Statistical significance between the mean Ct value for strains with an aspartate (n=102) or glycine (n=812) amino acid at position 614 of the spike protein was determined with the Mann-Whitney test (GraphPad PRISM 8).
    GraphPad
    suggested: (GraphPad Prism, RRID:SCR_002798)
    Absorbance intensity (450nm) was normalized within a plate and EC50 values were calculated through 4-parameter logistic curve (4PL) analysis using GraphPad PRISM 8.4.3.
    GraphPad PRISM
    suggested: (GraphPad Prism, RRID:SCR_002798)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

  2. Our take

    This study used genetic sequences to understand the introduction of SARS-CoV-2 into Houston, Texas, a large metropolitan area in the United States. The authors generated over 5,000 genomes and found evidence for multiple introductions of the virus from all over the world, and pointed out key differences in the individuals affected during the first (March-May 2020) and second (May-July 2020) waves of infection. They find evidence that patients with virus containing the D614G mutation have higher viral loads, and argue this supports the hypothesis that this mutation makes this type of virus more transmissible, but do not specify when during infection samples were taken. In general, the breadth of the data and analyses presented in this paper are impressive, but some analyses lack nuance and sufficient validation.

    Study design

    other

    Study population and setting

    This study investigated two waves of COVID-19 infection in Houston, Texas, an ethnically diverse region of the United States. The authors generated and analyzed 5,085 SARS-CoV-2 genomes collected from March 5 – July 7, 2020, collected from over 55,000 patients within the Houston Methodist Hospital system. The authors also experimentally analyzed synthetic spike protein constructs in the lab to evaluate the functional effects of specific mutations observed in their genomes. They specifically focus on the D614G mutation, which has been observed in a large portion of viruses from recent cases in the United States and Europe, prompting speculation that virus with this mutation may be more transmissible.

    Summary of main findings

    This study contains three primary findings: (1) Multiple strains of SARS-CoV-2 were introduced into the Houston area in March 2020 from diverse geographic regions; (2) There were two waves (i.e., peaks) of cases, with the second wave affecting younger individuals with fewer comorbidities. The second wave consisted almost exclusively of SARS-CoV-2 strains with a much-noted mutation in the spike protein of the virus (D614G), whereas 82% of genomes from the first wave contained this mutation; (3) There were no mutations in the 5,085 genomes at sites known to cause resistance to the drug remdesivir, but they found that D614G mutation in the spike protein was associated with higher viral loads, suggesting it may be better able to enter host cells and spread through human populations.

    Study strengths

    This study is the largest SARS-CoV-2 genomic study in the United States to date, analyzing data from over 5,000 patients. The large number of genomes analyzed allowed the authors to comprehensively understand the mutations circulating in the region. Additionally, molecular studies provide much-needed additional data on the potential functional implications of the D614G spike protein mutation.

    Limitations

    This study has several limitations. First, the authors claim their sequences are representative of COVID-19 cases in Houston, but do not provide information on even the basic demographics of the infected individuals. Second, they claim the increase in cases in wave 2 with the D614G spike protein is statistically significant, but they fail to address that this assumes similar epidemic dynamics during the two waves. They also state that viral load is higher in patients with the D614G SARS-CoV-2 variant, but do not address or correct for when the sample was taken during the patient’s course of infection; this would be important because there are several studies already that show a clear correlation between viral load and days since symptom onset. Finally, they provided very few citations to previous work in their introduction (only in the discussion do they provide context for some of their findings), and did not make their genomic data publicly available at the time of publication, hindering further research on this topic.

    Value added

    This study provides a large number of SARS-CoV-2 genome sequences from a diverse metropolitan area in the United States. These data provide a comprehensive picture of the strains circulating in Houston, Texas between March-July 2020, and shows that there were multiple introductions from around the globe. The data also show that the popular drug remdesivir is likely to be effective on all strains circulating in the region. Finally, the authors conduct experiments on the spike protein of the virus and evaluate the potential effect of the D614G mutation on the transmissibility of the virus.

  3. SciScore for 10.1101/2020.05.01.072652: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Nucleotide sequence alignments for the combined Houston and GISAID strains were generated using MAFFT version 7.130b with default parameters (32).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Sequences were manually curated in JalView (33) to trim the ends and to remove sequences containing spurious inserts.
    JalView
    suggested: (Jalview, RRID:SCR_006459)
    Phylogenetic trees were generated using FastTree with the generalized time-reversible model for nucleotide sequences (34).
    FastTree
    suggested: (FastTree, RRID:SCR_015501)
    Analysis of the nsp12 polymerase and S protein genes: The nsp12 viral polymerase and S protein genes were analyzed by plotting SNP density in the consensus alignment using Python (Python v3.4.3, Biopython Package v1.72).
    Python
    suggested: (IPython, RRID:SCR_001658)
    Biopython
    suggested: (Biopython, RRID:SCR_007173)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.