Large-scale analysis of SARS-CoV-2 spike-glycoprotein mutants demonstrates the need for continuous screening of virus isolates

This article has been Reviewed by the following groups

Read the full article

Abstract

Due to the widespread of the COVID-19 pandemic, the SARS-CoV-2 genome is evolving in diverse human populations. Several studies already reported different strains and an increase in the mutation rate. Particularly, mutations in SARS-CoV-2 spike-glycoprotein are of great interest as it mediates infection in human and recently approved mRNA vaccines are designed to induce immune responses against it. We analyzed 1,036,030 SARS-CoV-2 genome assemblies and 30,806 NGS datasets from GISAID and European Nucleotide Archive (ENA) focusing on non-synonymous mutations in the spike protein. Only around 2.5% of the samples contained the wild-type spike protein with no variation from the reference. Among the spike protein mutants, we confirmed a low mutation rate exhibiting less than 10 non-synonymous mutations in 99.6% of the analyzed sequences, but the mean and median number of spike protein mutations per sample increased over time. 5,472 distinct variants were found in total. The majority of the observed variants were recurrent, but only 21 and 14 recurrent variants were found in at least 1% of the mutant genome assemblies and NGS samples, respectively. Further, we found high-confidence subclonal variants in about 2.6% of the NGS data sets with mutant spike protein, which might indicate co-infection with various SARS-CoV-2 strains and/or intra-host evolution. Lastly, some variants might have an effect on antibody binding or T-cell recognition. These findings demonstrate the continuous importance of monitoring SARS-CoV-2 sequences for an early detection of variants that require adaptations in preventive and therapeutic strategies.

Article activity feed

  1. SciScore for 10.1101/2021.02.04.429765: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Pairwise alignments to the reference surface glycoprotein (NC_045512.2_cds_YP_009724390.1_3) were performed to extract the S gene sequences from GISAID samples using the R package Biostrings (version 2.52.0).
    Biostrings
    suggested: (Biostrings, RRID:SCR_016949)
    NGS data processing: All available NGS data for SARS-CoV-2 was downloaded on October 14th, 2020 from the NCBI Sequence Read Archive (SRA) (https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/; Leinonen et al. 2011) and filtered for whole genome fastq data from Illumina instruments with a human sample background.
    NCBI Sequence Read Archive
    suggested: (NCBI Sequence Read Archive (SRA, RRID:SCR_004891)
    Output files in SAM format were sorted and converted to their binary form (BAM) using SAMtools (version 0.1.16) (Li et al. 2009).
    SAMtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    Variants were retrieved from the alignment files using BCFtools (version 1.9) mpileup (http://samtools.github.io/bcftools/) with the options to recalculate per-base alignment quality on the fly, disabling the maximum per-file depth, and retention of anomalous read pairs.
    BCFtools
    suggested: (SAMtools/BCFtools, RRID:SCR_005227)
    Variants in gene gp02 (i.e. S gene) were annotated using SNPeff (version 4.3t) “ann” (Cingolani et al. 2012).
    SNPeff
    suggested: (SnpEff, RRID:SCR_005191)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.