In depth analysis of Cyprus-specific mutations of SARS-CoV-2 strains using computational approaches

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

This study aims to characterize SARS-CoV-2 mutations which are primarily prevalent in the Cypriot population. Moreover, using computational approaches, we assess whether these mutations are associated with changes in viral virulence.

Methods

We utilize genetic data from 144 sequences of SARS-CoV-2 strains from the Cypriot population obtained between March 2020 and January 2021, as well as all data available from GISAID. We combine this with countries’ regional information, such as deaths and cases per million, as well as COVID-19-related public health austerity measure response times. Initial indications of selective advantage of Cyprus-specific mutations are obtained by mutation tracking analysis. This entails calculating specific mutation frequencies within the Cypriot population and comparing these with their prevalence world-wide throughout the course of the pandemic. We further make use of linear regression models to extrapolate additional information that may be missed through standard statistical analysis.

Results

We report a single mutation found in the ORF1ab gene (nucleotide position 18,440) that appears to be significantly enriched within the Cypriot population. The amino acid change is denoted as S6059F, which maps to the SARS-CoV-2 NSP14 protein. We further analyse this mutation using regression models to investigate possible associations with increased deaths and cases per million. Moreover, protein structure prediction tools show that the mutation infers a conformational change to the protein that significantly alters its structure when compared to the reference protein.

Conclusions

Investigating Cyprus-specific mutations for SARS-CoV-2 can lead to a better understanding of viral pathogenicity. Researching these mutations can generate potential links between viral-specific mutations and the unique genomics of the Cypriot population. This can not only lead to important findings from which to battle the pandemic on a national level, but also provide insights into viral virulence worldwide.

Article activity feed

  1. SciScore for 10.1101/2021.06.08.447477: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Raw data analysis: The Burrows-Wheeler Aligner (BWA) [11], version: 0.7.15 was used to map the raw reads to Wuhan-Hu-1 (NCBI ID:NC_045512.2)
    BWA
    suggested: (BWA, RRID:SCR_010910)
    Duplicate reads, which are likely to be the results of PCR bias, were marked using Picard (http://broadinstitute.github.io/picard/)
    Picard
    suggested: (Picard, RRID:SCR_006525)
    2.6.0. SAMtools [12], version: 0.1.19, was used for additional BAM/SAM file manipulations.
    SAMtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    Finally, the GATK FastaAlternateReferenceMaker method was used for consensus sequence extraction from the vcf files.
    GATK
    suggested: (GATK, RRID:SCR_001876)
    MAFFT [17] was used to construct a multiple sequence alignment (MSA).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Phylogeny was estimated using the RAxML [18] maximum likelihood algorithm for phylogenetic tree construction.
    RAxML
    suggested: (RAxML, RRID:SCR_006086)
    Analysis was performed using R (packages: dplyr, tidyr, ggplot2, ggtree, phytools, phangorn).
    ggplot2
    suggested: (ggplot2, RRID:SCR_014601)
    I-TASSER was selected for protein structure modelling, since it outperformed other servers according to results from the 14th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP14) (https://zhanglab.ccmb.med.umich.edu/casp14/, last accessed 23/03/2021).
    I-TASSER
    suggested: (I-TASSER, RRID:SCR_014627)
    The DynaMut webserver [21], was used to visualize non-covalent molecular interactions, calculated by the Arpeggio algorithm [22]
    Arpeggio
    suggested: (Arpeggio, RRID:SCR_010876)
    Structural alignment was performed using the align tool of PyMOL and all-atom RMSD values were calculated without any outliers’ rejection, with zero cycles of refinement.
    PyMOL
    suggested: (PyMOL, RRID:SCR_000305)

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.