Molecular Architecture of Early Dissemination and Massive Second Wave of the SARS-CoV-2 Virus in a Major Metropolitan Area

S. Wesley Long
Randall J. Olsen
Paul A. Christensen
David W. Bernard
James J. Davis
Maulik Shukla
Marcus Nguyen
Matthew Ojeda Saavedra
Prasanti Yerramilli
Layne Pruitt
Sishir Subedi
Hung-Che Kuo
Heather Hendrickson
Ghazaleh Eskandari
Hoang A. T. Nguyen
J. Hunter Long
Muthiah Kumaraswami
Jule Goike
Daniel Boutz
Jimmy Gollihar
Jason S. McLellan
Chia-Wei Chou
Kamyab Javanmardi
Ilya J. Finkelstein
James M. Musser

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)
Evaluated articles (NCRC)

Abstract

There is concern about second and subsequent waves of COVID-19 caused by the SARS-CoV-2 coronavirus occurring in communities globally that had an initial disease wave. Metropolitan Houston, TX, with a population of 7 million, is experiencing a massive second disease wave that began in late May 2020. To understand SARS-CoV-2 molecular population genomic architecture and evolution and the relationship between virus genotypes and patient features, we sequenced the genomes of 5,085 SARS-CoV-2 strains from these two waves. Our report provides the first molecular characterization of SARS-CoV-2 strains causing two distinct COVID-19 disease waves.

SciScore for 10.1101/2020.09.22.20199125: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Antibodies
Sentences	Resources
The ACE2-hFc chimera was obtained from GenScript (Z03484), and the CR3022 antibody was purchased from Abcam (Ab273073)	Z03484 suggested: None CR3022 suggested: (Imported from the IEDB Cat# CR3022, RRID:AB_2848080) Ab273073 suggested: None
Experimental Models: Organisms/Strains
Sentences	Resources
For example, the ABO column was divided into four columns for A, B, AB, and O blood type.	AB suggested: RRID:BDSC_203)
Software and Algorithms
Sentences	Resources
Nucleotide sequence alignments for the combined …

SciScore for 10.1101/2020.09.22.20199125: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Antibodies
Sentences	Resources
The ACE2-hFc chimera was obtained from GenScript (Z03484), and the CR3022 antibody was purchased from Abcam (Ab273073)	Z03484 suggested: None CR3022 suggested: (Imported from the IEDB Cat# CR3022, RRID:AB_2848080) Ab273073 suggested: None
Experimental Models: Organisms/Strains
Sentences	Resources
For example, the ABO column was divided into four columns for A, B, AB, and O blood type.	AB suggested: RRID:BDSC_203)
Software and Algorithms
Sentences	Resources
Nucleotide sequence alignments for the combined Houston and GISAID strains were generated using MAFFT version 7.130b with default parameters (91).	MAFFT suggested: (MAFFT, RRID:SCR_011811)
Sequences were manually curated in JalView (92) to trim the ends and to remove sequences containing spurious inserts.	JalView suggested: (Jalview, RRID:SCR_006459)
Phylogenetic trees were generated using FastTree with the generalized time-reversible model for nucleotide sequences (93).	FastTree suggested: (FastTree, RRID:SCR_015501)
Analysis of the nsp12 polymerase and S protein genes: The nsp12 virus polymerase and S protein genes were analyzed by plotting SNP density in the consensus alignment using Python (Python v3.4.3	Python suggested: (IPython, RRID:SCR_001658)
Biopython Package v1.72).	Biopython suggested: (Biopython, RRID:SCR_007173)
Statistical significance between the mean Ct value for strains with an aspartate (n=102) or glycine (n=812) amino acid at position 614 of the spike protein was determined with the Mann-Whitney test (GraphPad PRISM 8).	GraphPad suggested: (GraphPad Prism, RRID:SCR_002798)
Absorbance intensity (450nm) was normalized within a plate and EC50 values were calculated through 4-parameter logistic curve (4PL) analysis using GraphPad PRISM 8.4.3.	GraphPad PRISM suggested: (GraphPad Prism, RRID:SCR_002798)

Results from OddPub: Thank you for sharing your data.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Version published to 10.1128/mbio.02707-20
Dec 22, 2020
NCRC
Oct 5, 2020

Our take

This study used genetic sequences to understand the introduction of SARS-CoV-2 into Houston, Texas, a large metropolitan area in the United States. The authors generated over 5,000 genomes and found evidence for multiple introductions of the virus from all over the world, and pointed out key differences in the individuals affected during the first (March-May 2020) and second (May-July 2020) waves of infection. They find evidence that patients with virus containing the D614G mutation have higher viral loads, and argue this supports the hypothesis that this mutation makes this type of virus more transmissible, but do not specify when during infection samples were taken. In general, the breadth of the data and analyses presented in this paper are impressive, but some analyses lack nuance and sufficient validation.

St…

Our take

This study used genetic sequences to understand the introduction of SARS-CoV-2 into Houston, Texas, a large metropolitan area in the United States. The authors generated over 5,000 genomes and found evidence for multiple introductions of the virus from all over the world, and pointed out key differences in the individuals affected during the first (March-May 2020) and second (May-July 2020) waves of infection. They find evidence that patients with virus containing the D614G mutation have higher viral loads, and argue this supports the hypothesis that this mutation makes this type of virus more transmissible, but do not specify when during infection samples were taken. In general, the breadth of the data and analyses presented in this paper are impressive, but some analyses lack nuance and sufficient validation.

Study design

other

Study population and setting

This study investigated two waves of COVID-19 infection in Houston, Texas, an ethnically diverse region of the United States. The authors generated and analyzed 5,085 SARS-CoV-2 genomes collected from March 5 – July 7, 2020, collected from over 55,000 patients within the Houston Methodist Hospital system. The authors also experimentally analyzed synthetic spike protein constructs in the lab to evaluate the functional effects of specific mutations observed in their genomes. They specifically focus on the D614G mutation, which has been observed in a large portion of viruses from recent cases in the United States and Europe, prompting speculation that virus with this mutation may be more transmissible.

Summary of main findings

This study contains three primary findings: (1) Multiple strains of SARS-CoV-2 were introduced into the Houston area in March 2020 from diverse geographic regions; (2) There were two waves (i.e., peaks) of cases, with the second wave affecting younger individuals with fewer comorbidities. The second wave consisted almost exclusively of SARS-CoV-2 strains with a much-noted mutation in the spike protein of the virus (D614G), whereas 82% of genomes from the first wave contained this mutation; (3) There were no mutations in the 5,085 genomes at sites known to cause resistance to the drug remdesivir, but they found that D614G mutation in the spike protein was associated with higher viral loads, suggesting it may be better able to enter host cells and spread through human populations.

Study strengths

This study is the largest SARS-CoV-2 genomic study in the United States to date, analyzing data from over 5,000 patients. The large number of genomes analyzed allowed the authors to comprehensively understand the mutations circulating in the region. Additionally, molecular studies provide much-needed additional data on the potential functional implications of the D614G spike protein mutation.

Limitations

This study has several limitations. First, the authors claim their sequences are representative of COVID-19 cases in Houston, but do not provide information on even the basic demographics of the infected individuals. Second, they claim the increase in cases in wave 2 with the D614G spike protein is statistically significant, but they fail to address that this assumes similar epidemic dynamics during the two waves. They also state that viral load is higher in patients with the D614G SARS-CoV-2 variant, but do not address or correct for when the sample was taken during the patient’s course of infection; this would be important because there are several studies already that show a clear correlation between viral load and days since symptom onset. Finally, they provided very few citations to previous work in their introduction (only in the discussion do they provide context for some of their findings), and did not make their genomic data publicly available at the time of publication, hindering further research on this topic.

Value added

This study provides a large number of SARS-CoV-2 genome sequences from a diverse metropolitan area in the United States. These data provide a comprehensive picture of the strains circulating in Houston, Texas between March-July 2020, and shows that there were multiple introductions from around the globe. The data also show that the popular drug remdesivir is likely to be effective on all strains circulating in the region. Finally, the authors conduct experiments on the spike protein of the virus and evaluate the potential effect of the D614G mutation on the transmissibility of the virus.

Read the original source
Version published to 10.1101/2020.09.22.20199125 on medRxiv
Sep 23, 2020

SciScore for 10.1101/2020.05.01.072652: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Nucleotide sequence alignments for the combined Houston and GISAID strains were generated using MAFFT version 7.130b with default parameters (32).	MAFFT suggested: (MAFFT, RRID:SCR_011811)
Sequences were manually curated in JalView (33) to trim the ends and to remove sequences containing spurious inserts.	JalView suggested: (Jalview, RRID:SCR_006459)
Phylogenetic trees were generated using FastTree with the generalized time-reversible model for nucleotide sequences (34).	FastTree suggest…

SciScore for 10.1101/2020.05.01.072652: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Nucleotide sequence alignments for the combined Houston and GISAID strains were generated using MAFFT version 7.130b with default parameters (32).	MAFFT suggested: (MAFFT, RRID:SCR_011811)
Sequences were manually curated in JalView (33) to trim the ends and to remove sequences containing spurious inserts.	JalView suggested: (Jalview, RRID:SCR_006459)
Phylogenetic trees were generated using FastTree with the generalized time-reversible model for nucleotide sequences (34).	FastTree suggested: (FastTree, RRID:SCR_015501)
Analysis of the nsp12 polymerase and S protein genes: The nsp12 viral polymerase and S protein genes were analyzed by plotting SNP density in the consensus alignment using Python (Python v3.4.3, Biopython Package v1.72).	Python suggested: (IPython, RRID:SCR_001658) Biopython suggested: (Biopython, RRID:SCR_007173)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Version published to 10.1101/2020.05.01.072652 on bioRxiv
May 1, 2020

Uncovering the fitness of endemically circulating Zika virus strains

This article has 6 authors:
1. Yining Chen
2. Douglas Fritz
3. Hannah Clapham
4. Noemie Lefrancq
5. Henrik Salje
6. the BUZZ study team
This article has no evaluationsLatest version Jun 24, 2026
Complex intra-host SARS-CoV-2 evolution following monoclonal antibody pre-exposure prophylaxis

This article has 30 authors:
1. Kimia Kamelian
2. David J Pascall
3. Mark Tsz Kin Cheng
4. Bo Meng
5. Mazharul Altaf
6. Rebecca M Morse
7. Juliana B Aggio
8. Daniel J.S. Egan
9. Michael Chen-Xu
10. Giorgio Trivioli
11. Benjamin Sutton
12. Alex Richter
13. Luis Daniel Gonzalez-Vazquez
14. Claire Cormie
15. Steven Kemp
16. Rory Yeadon
17. Ben Hyatt
18. Andrew Wong
19. Nashma Thesin Pelamkulangara
20. Emma Fraser
21. Benjamin McCarthy
22. Fernanda Novaes
23. Sara Stott
24. Anastasia Galvin
25. Katherine L Bellis
26. Daniela De Angelis
27. Ewan M Harrison
28. Darren Martin
29. Rona M Smith
30. Ravindra K Gupta
This article has no evaluationsLatest version Jul 17, 2026
Dengue virus in Solomon Islands 2023-2025: a whole genome surveillance study

This article has 15 authors:
1. Jean Moselen
2. Eike Steinig
3. Andrew Darcy
4. Alfred Dofai
5. Angella Manele
6. Brenda Lauri
7. Cynthia Joshua
8. Paul Mauruwai
9. Ammar Aziz
10. Paul F. Horwood
11. Nicole Orlando
12. Leon Caly
13. Navin Karan
14. Janella Solomon
15. Chuan Kok Lim
This article has no evaluationsLatest version Jul 9, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Our take

St…

Our take

Study design

Study population and setting

Summary of main findings

Study strengths

Limitations

Value added

Related articles

Uncovering the fitness of endemically circulating Zika virus strains

Complex intra-host SARS-CoV-2 evolution following monoclonal antibody pre-exposure prophylaxis

Dengue virus in Solomon Islands 2023-2025: a whole genome surveillance study