Evolutionary analysis of SARS-CoV-2 spike protein for its different clades
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Objective
The spike protein of SARS-CoV-2 has become the main target for antiviral and vaccine development. Despite its relevance, there is scarce information about its evolutionary traces. The aim of this study was to investigate the diversification patterns of the spike for each clade of SARS-CoV-2 through different approaches.
Methods
Two thousand and one hundred sequences representing the seven clades of the SARS-CoV-2 were included. Patterns of genetic diversifications and nucleotide evolutionary rate were estimated for the spike genomic region.
Results
The haplotype networks showed a star shape, where multiple haplotypes with few nucleotide differences diverge from a common ancestor. Four hundred seventy nine different haplotypes were defined in the seven analyzed clades. The main haplotype, named Hap-1, was the most frequent for clades G (54%), GH (54%), and GR (56%) and a different haplotype (named Hap-252) was the most important for clades L (63.3%), O (39.7%), S (51.7%), and V (70%). The evolutionary rate for the spike protein was estimated as 1.08 x 10 −3 nucleotide substitutions/site/year. Moreover, the nucleotide evolutionary rate after nine months of pandemic was similar for each clade.
Conclusions
In conclusion, the present evolutionary analysis is relevant since the spike protein of SARS-CoV-2 is the target for most therapeutic candidates; besides, changes in this protein could have consequences on viral transmission, response to antivirals and efficacy of vaccines. Moreover, the evolutionary characterization of clades improves knowledge of SARS-CoV-2 and deserves to be assessed in more detail since re-infection by different phylogenetic clades has been reported.
Article activity feed
-
SciScore for 10.1101/2020.11.24.396671: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization 2.1 Datasets: In order to generate datasets representing different geographic regions and time evolution for each of the seven clades of SARS-CoV-2, from December 2019 to September 2020, data of complete genome sequences available at GISAID (https://www.gisaid.org/) on September 2020 were randomly monthly collected for several geographic regions. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
Software and Algorithms Sentences Resources Complete genomes were aligned using MAFFT against the Wuhan-Hu-1 reference genome (NC_045512.2, EPI_ISL_402125). MAFFTsuggested: (MAFFT, RRID:SCR_0118…SciScore for 10.1101/2020.11.24.396671: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization 2.1 Datasets: In order to generate datasets representing different geographic regions and time evolution for each of the seven clades of SARS-CoV-2, from December 2019 to September 2020, data of complete genome sequences available at GISAID (https://www.gisaid.org/) on September 2020 were randomly monthly collected for several geographic regions. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
Software and Algorithms Sentences Resources Complete genomes were aligned using MAFFT against the Wuhan-Hu-1 reference genome (NC_045512.2, EPI_ISL_402125). MAFFTsuggested: (MAFFT, RRID:SCR_011811)The resulting multiple sequence alignments were split in a dataset corresponding to the S region [3,822nt (21,563-25,384)] and RBD (included in S) [762nt (22,550-23,311)]. 2.2 Phylogenetic and statically analysis / Genetic characterization: Patterns of genetic diversifications for both genomic regions S and RBD for each clade were analyzed using the median-joining reconstruction method at the PopART v1.7.2 software (Leigh & Bryant, PopARTsuggested: NoneHaplotypes shared among all clades were analyzed in Arlequin 3.5.2.2 software (Excoffier & Lischer, 2010) Arlequinsuggested: (ARLEQUIN, RRID:SCR_009051)2.3 Nucleotide evolutionary rate: The estimation of the nucleotide evolutionary rate for the entire S-coding region datasets were carried out with the Beast v1.8.4 program package (Suchard et al. 2018) at the CIPRES Science Gateway server (Miller et al. 2010). Beastsuggested: (BEAST, RRID:SCR_010228)The best nucleotide substitution model was selected according to the Bayesian information criterion (BIC) method in IQ-TREE v1.6.12 software (Kalyaanamoorthy et al. 2017). IQ-TREEsuggested: (IQ-TREE, RRID:SCR_017254)The convergence of the “meanRate” and “allMus” parameters [effective sample size (ESS) ≥ 200, burn-in 10%] was verified with Tracer v1.7.1 (Rambaut et al. 2018). Tracersuggested: (Tracer, RRID:SCR_019121)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:The inclusion in the study of only 2,100 of the 73,393 available sequences on September 2020 is a limitation that imply a bias in the obtained results, although the sequence selection process was carefully carried out in order to generate a representative dataset from different time courses and a wide geographic range.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-