Structural Variants in SARS-CoV-2 Occur at Template-Switching Hotspots

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The evolutionary dynamics of SARS-CoV-2 have been carefully monitored since the COVID-19 pandemic began in December 2019, however, analysis has focused primarily on single nucleotide polymorphisms and largely ignored the role of structural variants (SVs) as well as recombination in SARS-CoV-2 evolution. Using sequences from the GISAID database, we catalogue over 100 insertions and deletions in the SARS-CoV-2 consensus sequences. We hypothesize that these indels are artifacts of imperfect homologous recombination between SARS-CoV-2 replicates, and provide four independent pieces of evidence. (1) The SVs from the GISAID consensus sequences are clustered at specific regions of the genome. (2) These regions are also enriched for 5’ and 3’ breakpoints in the transcription regulatory site (TRS) independent transcriptome, presumably sites of RNA-dependent RNA polymerase (RdRp) template-switching. (3) Within raw reads, these structural variant hotspots have cases of both high intra-host heterogeneity and intra-host homogeneity, suggesting that these structural variants are both consequences of de novo recombination events within a host and artifacts of previous recombination. (4) Within the RNA secondary structure, the indels occur in “arms” of the predicted folded RNA, suggesting that secondary structure may be a mechanism for TRS-independent template-switching in SARS-CoV-2 or other coronaviruses. These insights into the relationship between structural variation and recombination in SARS-CoV-2 can improve our reconstructions of the SARS-CoV-2 evolutionary history as well as our understanding of the process of RdRp template-switching in RNA viruses.

Article activity feed

  1. SciScore for 10.1101/2020.09.01.278952: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    To obtain the raw reads, we accessed the NCBI SRA run browser on June 3, 2020.
    NCBI SRA run browser
    suggested: (European Nucleotide Archive (ENA, RRID:SCR_006515)
    We marked and removed PCR duplicates using GATK’s MarkDuplicates.
    GATK’s
    suggested: None

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • No conflict of interest statement was detected. If there are no conflicts, we encourage authors to explicit state so.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.