SARS-CoV-2 variant evolution in the United States: High accumulation of viral mutations over time likely through serial Founder Events and mutational bursts

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Since the first case of COVID-19 in December 2019 in Wuhan, China, SARS-CoV-2 has spread worldwide and within a year and a half has caused 3.56 million deaths globally. With dramatically increasing infection numbers, and the arrival of new variants with increased infectivity, tracking the evolution of its genome is crucial for effectively controlling the pandemic and informing vaccine platform development. Our study explores evolution of SARS-CoV-2 in a representative cohort of sequences covering the entire genome in the United States, through all of 2020 and early 2021. Strikingly, we detected many accumulating Single Nucleotide Variations (SNVs) encoding amino acid changes in the SARS-CoV-2 genome, with a pattern indicative of RNA editing enzymes as major mutators of SARS-CoV-2 genomes. We report three major variants through October of 2020. These revealed 14 key mutations that were found in various combinations among 14 distinct predominant signatures. These signatures likely represent evolutionary lineages of SARS-CoV-2 in the U.S. and reveal clues to its evolution such as a mutational burst in the summer of 2020 likely leading to a homegrown new variant, and a trend towards higher mutational load among viral isolates, but with occasional mutation loss. The last quartile of 2020 revealed a concerning accumulation of mostly novel low frequency replacement mutations in the Spike protein, and a hypermutable glutamine residue near the putative furin cleavage site. Finally, end of the year data and 2021 revealed the gradual increase to prevalence of known variants of concern, particularly B.1.1.7, that have acquired additional Spike mutations. Overall, our results suggest that predominant viral genomes are dynamically evolving over time, with periods of mutational bursts and unabated mutation accumulation. This high level of existing variation, even at low frequencies and especially in the Spike-encoding region may become problematic when super-spreader events, akin to serial Founder Events in evolution, drive these rare mutations to prominence.

Article activity feed

  1. Rowena Bull

    Review of "SARS-CoV-2 variant evolution in the United States: High accumulation of viral mutations over time likely through serial Founder Events and mutational bursts"

    Reviewer: Rowena Bull (University of New South Wales) | 📗📗📗📗◻️

  2. SciScore for 10.1101/2021.02.19.431311: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Alignments were performed with the Multiple Sequence Alignment tool “Clustal Omega” [68,69], comparing each US state separately against the reference genome.
    Omega”
    suggested: None
    We used Clustal alignment outputs (with character counts) as input for our python 3.8 script to call SNVs (Single Nucleotide Variation), which incorporated from ‘biopython’ [70] alignment reading commands for outputs of variation from the reference.
    python
    suggested: (IPython, RRID:SCR_001658)
    Annotation of genomic variants with regards to regions in the viral genome (organized into ORFs) was performed employing NCBI RefSeq SARS-CoV-2 genome annotation, which is also publicly available in the NCBI SARS-CoV-2 Resources portal.
    RefSeq
    suggested: (RefSeq, RRID:SCR_003496)
    For detecting specific deletions (eg ΔH69/V70) we employed the BLAST (Basic Local Alignment Search Tool) command line application, by calling for gap-containing sequences compared to the reference, in a locally-constructed database with all viral isolate sequences (n=8171) satisfying the aforementioned criteria.
    BLAST
    suggested: (BLASTX, RRID:SCR_001653)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.