Evolutionary analysis of SARS-CoV-2 spike protein for its different clades

Abstract

Objective

The spike protein of SARS-CoV-2 has become the main target for antiviral and vaccine development. Despite its relevance, there is scarce information about its evolutionary traces. The aim of this study was to investigate the diversification patterns of the spike for each clade of SARS-CoV-2 through different approaches.

Methods

Two thousand and one hundred sequences representing the seven clades of the SARS-CoV-2 were included. Patterns of genetic diversifications and nucleotide evolutionary rate were estimated for the spike genomic region.

Results

The haplotype networks showed a star shape, where multiple haplotypes with few nucleotide differences diverge from a common ancestor. Four hundred seventy nine different haplotypes were defined in the seven analyzed clades. The main haplotype, named Hap-1, was the most frequent for clades G (54%), GH (54%), and GR (56%) and a different haplotype (named Hap-252) was the most important for clades L (63.3%), O (39.7%), S (51.7%), and V (70%). The evolutionary rate for the spike protein was estimated as 1.08 x 10 ⁻³ nucleotide substitutions/site/year. Moreover, the nucleotide evolutionary rate after nine months of pandemic was similar for each clade.

Conclusions

In conclusion, the present evolutionary analysis is relevant since the spike protein of SARS-CoV-2 is the target for most therapeutic candidates; besides, changes in this protein could have consequences on viral transmission, response to antivirals and efficacy of vaccines. Moreover, the evolutionary characterization of clades improves knowledge of SARS-CoV-2 and deserves to be assessed in more detail since re-infection by different phylogenetic clades has been reported.

SciScore for 10.1101/2020.11.24.396671: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	2.1 Datasets: In order to generate datasets representing different geographic regions and time evolution for each of the seven clades of SARS-CoV-2, from December 2019 to September 2020, data of complete genome sequences available at GISAID (https://www.gisaid.org/) on September 2020 were randomly monthly collected for several geographic regions.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Complete genomes were aligned using MAFFT against the Wuhan-Hu-1 reference genome (NC_045512.2, EPI_ISL_402125).	MAFFT suggested: (MAFFT, RRID:SCR_0118…

SciScore for 10.1101/2020.11.24.396671: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	2.1 Datasets: In order to generate datasets representing different geographic regions and time evolution for each of the seven clades of SARS-CoV-2, from December 2019 to September 2020, data of complete genome sequences available at GISAID (https://www.gisaid.org/) on September 2020 were randomly monthly collected for several geographic regions.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Complete genomes were aligned using MAFFT against the Wuhan-Hu-1 reference genome (NC_045512.2, EPI_ISL_402125).	MAFFT suggested: (MAFFT, RRID:SCR_011811)
The resulting multiple sequence alignments were split in a dataset corresponding to the S region [3,822nt (21,563-25,384)] and RBD (included in S) [762nt (22,550-23,311)]. 2.2 Phylogenetic and statically analysis / Genetic characterization: Patterns of genetic diversifications for both genomic regions S and RBD for each clade were analyzed using the median-joining reconstruction method at the PopART v1.7.2 software (Leigh & Bryant,	PopART suggested: None
Haplotypes shared among all clades were analyzed in Arlequin 3.5.2.2 software (Excoffier & Lischer, 2010)	Arlequin suggested: (ARLEQUIN, RRID:SCR_009051)
2.3 Nucleotide evolutionary rate: The estimation of the nucleotide evolutionary rate for the entire S-coding region datasets were carried out with the Beast v1.8.4 program package (Suchard et al. 2018) at the CIPRES Science Gateway server (Miller et al. 2010).	Beast suggested: (BEAST, RRID:SCR_010228)
The best nucleotide substitution model was selected according to the Bayesian information criterion (BIC) method in IQ-TREE v1.6.12 software (Kalyaanamoorthy et al. 2017).	IQ-TREE suggested: (IQ-TREE, RRID:SCR_017254)
The convergence of the “meanRate” and “allMus” parameters [effective sample size (ESS) ≥ 200, burn-in 10%] was verified with Tracer v1.7.1 (Rambaut et al. 2018).	Tracer suggested: (Tracer, RRID:SCR_019121)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

The inclusion in the study of only 2,100 of the 73,393 available sequences on September 2020 is a limitation that imply a bias in the obtained results, although the sequence selection process was carefully carried out in order to generate a representative dataset from different time courses and a wide geographic range.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Evolutionary analysis of SARS-CoV-2 spike protein for its different clades

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Objective

Methods

Results

Conclusions

Article activity feed

Insights into Genomic Dynamics and Plasticity in the Monkeypox Virus from the 2022 Outbreak

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

Dengue Virus Type 2: Global Epidemiology, Molecular Evolution, and Immune Response Insights

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Objective

Methods

Results

Conclusions

Article activity feed

Related articles

Insights into Genomic Dynamics and Plasticity in the Monkeypox Virus from the 2022 Outbreak

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

Dengue Virus Type 2: Global Epidemiology, Molecular Evolution, and Immune Response Insights