Analysis of SARS-CoV-2 Mutations Over Time Reveals Increasing Prevalence of Variants in the Spike Protein and RNA-Dependent RNA Polymerase

Abstract

Amid the ongoing COVID-19 pandemic, it has become increasingly important to monitor the mutations that arise in the SARS-CoV-2 virus, to prepare public health strategies and guide the further development of vaccines and therapeutics. The spike (S) protein and the proteins comprising the RNA-Dependent RNA Polymerase (RdRP) are key vaccine and drug targets, respectively, making mutation surveillance of these proteins of great importance.

Full protein sequences for the spike proteins and RNA-dependent RNA polymerase proteins were downloaded from the GISAID database, aligned, and the variants identified. Polymorphisms in the protein sequence were investigated at the protein structural level and examined longitudinally in order to identify sequence and strain variants that are emerging over time. Our analysis revealed a group of variants in the spike protein and the polymerase complex that appeared in August, and account for around five percent of the genomes analyzed up to the last week of October. A structural analysis also facilitated investigation of several unique variants in the receptor binding domain and the N-terminal domain of the spike protein, with high-frequency mutations occurring more commonly in these regions. The identification of new variants emphasizes the need for further study on the effects of these mutations and the implications of their increased prevalence, particularly as these mutations may impact vaccine or therapeutic efficacy.

SciScore for 10.1101/2021.03.05.433666: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
The reference genome used in our analysis was the Severe Acute Respiratory Syndrome Coronavirus 2 Isolate WIV04 (WIV04), sequenced in Wuhan, China on December 30th, 2019.12 The raw FASTA file was split by protein into 27 files using a Python script in Jupyter Notebook (version 6.1.4),13 and each protein was processed separately through all subsequent steps.	Python suggested: (IPython, RRID:SCR_001658)
Filtering of Sequences: Sequences were filtered in Python using the Biopython …

SciScore for 10.1101/2021.03.05.433666: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
The reference genome used in our analysis was the Severe Acute Respiratory Syndrome Coronavirus 2 Isolate WIV04 (WIV04), sequenced in Wuhan, China on December 30th, 2019.12 The raw FASTA file was split by protein into 27 files using a Python script in Jupyter Notebook (version 6.1.4),13 and each protein was processed separately through all subsequent steps.	Python suggested: (IPython, RRID:SCR_001658)
Filtering of Sequences: Sequences were filtered in Python using the Biopython SeqIO module.	Biopython suggested: (Biopython, RRID:SCR_007173)
Sequence Dereplication: In order to streamline our computational pipeline, identical sequences were condensed into clusters using USEARCH (version 11.0.667).15 Clusters, representing unique sequences, were written out to a FASTA file with the ID of the cluster and the number of sequences in the cluster.	USEARCH suggested: (mubiomics, RRID:SCR_006785)
Clustal Omega was selected based on the balance between alignment quality and speed.	Clustal Omega suggested: (Clustal Omega, RRID:SCR_001591)
Parsing of Multiple Sequence Alignment: A Python script was developed in Jupyter notebook to automatically parse the aligned sequences for variants given the ID of the cluster containing the reference sequence, which was determined by searching for “WIV04” in the cluster information file using RStudio (version 1.3.1093).18 The Python script scanned through the other clusters (Supplementary Figure S1), comparing each codon with the corresponding codon of the reference cluster.	RStudio suggested: (RStudio, RRID:SCR_000432)
Three-dimensional Visualization of Frequently Mutated Sites: Structures of the spike protein and the RNA-dependent RNA polymerase (RdRP) complex were downloaded from the Protein Data Bank (PDB)20 and visualized using PyMOL.	PyMOL suggested: (PyMOL, RRID:SCR_000305)

Results from OddPub: Thank you for sharing your code.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
No funding statement was detected.
No protocol registration statement was detected.

Read the original source

Analysis of SARS-CoV-2 Mutations Over Time Reveals Increasing Prevalence of Variants in the Spike Protein and RNA-Dependent RNA Polymerase

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Insights into Genomic Dynamics and Plasticity in the Monkeypox Virus from the 2022 Outbreak

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

Immunoinformatics-Driven Design and In Silico Validation of a Multi Epitope Subunit Vaccine Targeting Norovirus

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Insights into Genomic Dynamics and Plasticity in the Monkeypox Virus from the 2022 Outbreak

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

Immunoinformatics-Driven Design and In Silico Validation of a Multi Epitope Subunit Vaccine Targeting Norovirus