Observed strong pervasive positive selection in the N-terminal domain, receptor-binding domain and furin-cleavage sites of SARS-CoV-2 Spike protein sampled from Zimbabwean COVID-19 patients

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Mutations primarily in the Spike (S) gene resulted in the emergence of many SARS-CoV-2 variants like Alpha, Beta, Delta and Omicron variants. This has also caused a number of COVID-19 pandemic waves which have impacted human lives in different ways due to restriction measures put in place to curb the spread of the virus. In this study, evolutionary patterns found in SARS-CoV-2 sequences of samples collected from Zimbabwean COVID-19 patients were investigated. High coverage SARS-CoV-2 whole genome sequences were downloaded from the GISAID database along with the GISAID S gene reference sequence. Biopython, NumPy and Pandas Data Science packages were used to load, slice and clean whole genome sequences outputting a fasta file with approximate Spike (S) gene sequences. Alignment of sliced dataset with GISAID reference sequence was done using Jalview 2.11.1.3 to find exact sequences of SARS-CoV-2 S gene. Evidence of recombination signals was investigated using RDP 4.1 and pervasive selection in the S gene was investigated using FUBAR algorithm hosted on the Datamonkey webserver. Matplotlib and Seaborn Python packages were used for Data Visualisation. A plot of Bayes factor hypothesizing non-synonymous substitution being greater than synonymous substitution (β > α) in the S protein sites showed 3 peaks with evidence of strong divergence. These 3 diverging S protein sites were found to be D142G, D614G and P681R. No evidence of recombination was detected by 9 methods of RDP which use different approaches to detect recombination signals. This study is useful in guiding drug, vaccine and diagnostic innovations toward better control of the pandemic. Additionally, this study can guide other non-biological interventions as we better understand the changes in various viral characteristics driven by the observed evolutionary patterns.

Article activity feed

  1. SciScore for 10.1101/2022.04.27.22274357: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The approximate S gene was extracted from whole genome of SARS-CoV-2 using Biopython, Panel
    Biopython
    suggested: (Biopython, RRID:SCR_007173)
    Dataframes (Pandas) and Numerical python (Numpy) Data Science packages.
    Numpy
    suggested: (NumPy, RRID:SCR_008633)
    The dataset was aligned using Clustal Omega Multiple sequence Alignment tool in Jalview 2.11.1.3 (Waterhouse et al. 2009).
    Jalview
    suggested: (Jalview, RRID:SCR_006459)
    MEGA X (Kumar et al., 2018) was then used for codon alignment of the dataset using Clustal Omega, a pre-requisite process for investigating Pervasive Selection using FUBAR (Murell et al. 2013) 3.2:
    Clustal Omega
    suggested: (Clustal Omega, RRID:SCR_001591)
    Python Data Science packages used for patient data manipulation were Numpy and Pandas and for Data Visualisation were Seaborn and Matplotlib.
    Python
    suggested: None
    Matplotlib
    suggested: (MatPlotLib, RRID:SCR_008624)
    3.4: Substitution Analysis: The codon-aligned S gene dataset was uploaded onto the DataMonkey webserver for pervasive selection analysis using FUBAR.
    DataMonkey
    suggested: (DataMonkey, RRID:SCR_010278)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.