Evolutionary analysis of SARS-CoV-2 spike protein for its different clades

This article has been Reviewed by the following groups

Read the full article

Abstract

Objective

The spike protein of SARS-CoV-2 has become the main target for antiviral and vaccine development. Despite its relevance, there is scarce information about its evolutionary traces. The aim of this study was to investigate the diversification patterns of the spike for each clade of SARS-CoV-2 through different approaches.

Methods

Two thousand and one hundred sequences representing the seven clades of the SARS-CoV-2 were included. Patterns of genetic diversifications and nucleotide evolutionary rate were estimated for the spike genomic region.

Results

The haplotype networks showed a star shape, where multiple haplotypes with few nucleotide differences diverge from a common ancestor. Four hundred seventy nine different haplotypes were defined in the seven analyzed clades. The main haplotype, named Hap-1, was the most frequent for clades G (54%), GH (54%), and GR (56%) and a different haplotype (named Hap-252) was the most important for clades L (63.3%), O (39.7%), S (51.7%), and V (70%). The evolutionary rate for the spike protein was estimated as 1.08 x 10 −3 nucleotide substitutions/site/year. Moreover, the nucleotide evolutionary rate after nine months of pandemic was similar for each clade.

Conclusions

In conclusion, the present evolutionary analysis is relevant since the spike protein of SARS-CoV-2 is the target for most therapeutic candidates; besides, changes in this protein could have consequences on viral transmission, response to antivirals and efficacy of vaccines. Moreover, the evolutionary characterization of clades improves knowledge of SARS-CoV-2 and deserves to be assessed in more detail since re-infection by different phylogenetic clades has been reported.

Article activity feed

  1. SciScore for 10.1101/2020.11.24.396671: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomization2.1 Datasets: In order to generate datasets representing different geographic regions and time evolution for each of the seven clades of SARS-CoV-2, from December 2019 to September 2020, data of complete genome sequences available at GISAID (https://www.gisaid.org/) on September 2020 were randomly monthly collected for several geographic regions.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Complete genomes were aligned using MAFFT against the Wuhan-Hu-1 reference genome (NC_045512.2, EPI_ISL_402125).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    The resulting multiple sequence alignments were split in a dataset corresponding to the S region [3,822nt (21,563-25,384)] and RBD (included in S) [762nt (22,550-23,311)]. 2.2 Phylogenetic and statically analysis / Genetic characterization: Patterns of genetic diversifications for both genomic regions S and RBD for each clade were analyzed using the median-joining reconstruction method at the PopART v1.7.2 software (Leigh & Bryant,
    PopART
    suggested: None
    Haplotypes shared among all clades were analyzed in Arlequin 3.5.2.2 software (Excoffier & Lischer, 2010)
    Arlequin
    suggested: (ARLEQUIN, RRID:SCR_009051)
    2.3 Nucleotide evolutionary rate: The estimation of the nucleotide evolutionary rate for the entire S-coding region datasets were carried out with the Beast v1.8.4 program package (Suchard et al. 2018) at the CIPRES Science Gateway server (Miller et al. 2010).
    Beast
    suggested: (BEAST, RRID:SCR_010228)
    The best nucleotide substitution model was selected according to the Bayesian information criterion (BIC) method in IQ-TREE v1.6.12 software (Kalyaanamoorthy et al. 2017).
    IQ-TREE
    suggested: (IQ-TREE, RRID:SCR_017254)
    The convergence of the “meanRate” and “allMus” parameters [effective sample size (ESS) ≥ 200, burn-in 10%] was verified with Tracer v1.7.1 (Rambaut et al. 2018).
    Tracer
    suggested: (Tracer, RRID:SCR_019121)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    The inclusion in the study of only 2,100 of the 73,393 available sequences on September 2020 is a limitation that imply a bias in the obtained results, although the sequence selection process was carefully carried out in order to generate a representative dataset from different time courses and a wide geographic range.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.