Insertions in SARS-CoV-2 genome caused by template switch and duplications give rise to new variants that merit monitoring

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The appearance of multiple new SARS-CoV-2 variants during the winter of 2020-2021 is a matter of grave concern. Some of these new variants, such as B.1.617.2, B.1.1.7, and B.1.351, manifest higher infectivity and virulence than the earlier SARS-CoV-2 variants, with potential dramatic effects on the course of the COVID-19 pandemic. So far, analysis of new SARS-CoV-2 variants focused primarily on point nucleotide substitutions and short deletions that are readily identifiable by comparison to consensus genome sequences. In contrast, insertions have largely escaped the attention of researchers although the furin site insert in the spike protein is thought to be a determinant of SARS-CoV-2 virulence and other inserts might have contributed to coronavirus pathogenicity as well. Here, we investigate insertions in SARS-CoV-2 genomes and identify 347 unique inserts of different lengths. We present evidence that these inserts reflect actual virus variance rather than sequencing errors. Two principal mechanisms appear to account for the inserts in the SARS-CoV-2 genomes, polymerase slippage and template switch that might be associated with the synthesis of subgenomic RNAs. We show that inserts in the Spike glycoprotein can affect its antigenic properties and thus merit monitoring. At least, three inserts in the N-terminal domain of the Spike (ins245IME, ins246DSWG, and ins248SSLT) that were first detected in 2021 are predicted to lead to escape from neutralizing antibodies, whereas other inserts might result in escape from T-cell immunity.

Article activity feed

  1. SciScore for 10.1101/2021.04.23.441209: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Insertion validation from raw read data: Raw reads were downloaded from SRA database (https://www.ncbi.nlm.nih.gov/sra) with SRA Toolkit (Supplementary Table 1).
    https://www.ncbi.nlm.nih.gov/sra
    suggested: (NCBI Sequence Read Archive (SRA, RRID:SCR_004891)
    The reads were mapped to the SARS-CoV-2 reference genome (NC_045512.2) with bowtie2 version 2.2.1 42, either in pair mode of single read mode, depending on the type of data deposited to the SRA.
    bowtie2
    suggested: (Bowtie 2, RRID:SCR_016368)
    The variants in each genome were called with LoFreq version 2.1.5 43 as described in Galaxy (https://github.com/galaxyproject/SARS-CoV-2/blob/master/genomics/4-Variation/variation_analysis.ipynb).
    Galaxy
    suggested: (Galaxy, RRID:SCR_006281)
    All insertions identified with LoFreq were visualized with the IGV software and manually inspected.
    LoFreq
    suggested: (LoFreq, RRID:SCR_013054)
    The clades containing the genomes of interest were extracted and vizualized with ETE 3 package for Python 45.
    Python
    suggested: (IPython, RRID:SCR_001658)
    The obtained protein models were visualized with Open-Source PyMOL version 2.4.
    PyMOL
    suggested: (PyMOL, RRID:SCR_000305)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.