Variant analysis of SARS-CoV-2 genomes in the Middle East

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Background

Coronavirus (COVID-19) was introduced into society in late 2019 and has now reached over 26 million cases and 850,000 deaths. The Middle East has a death toll of ∼50,000 and over 20,000 of these are in Iran, which has over 350,000 confirmed cases. We expect that Iranian cases caused outbreaks in the neighbouring countries and that variant mapping and phylogenetic analysis can be used to prove this. We also aim to analyse the variants of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) to characterise the common genome variants and provide useful data in the global effort to prevent further spread of COVID-19.

Methods

The approach uses bioinformatics approaches including multiple sequence alignment, variant calling and annotation and phylogenetic analysis to identify the genomic variants found in the region. The approach uses 122 samples from the 13 countries of the Middle East sourced from the Global Initiative on Sharing All Influenza Data (GISAID).

Findings

We identified 2200 distinct genome variants including 129 downstream gene variants, 298 frame shift variants, 789 missense variants, 1 start lost, 13 start gained, 1 stop lost, 249 synonymous variants and 720 upstream gene variants. The most common, high impact variants were 10818delTinsG, 2772delCinsC, 14159delCinsC and 2789delAinsA. Variant alignment and phylogenetic tree generation indicates that samples from Iran likely introduced COVID-19 to the rest of the Middle East.

Interpretation

The phylogenetic and variant analysis provides unique insight into mutation types in genomes. Initial introduction of COVID-19 was most likely due to Iranian transmission. Some countries show evidence of novel mutations and unique strains. Increased time in small populations is likely to contribute to more unique genomes. This study provides more in depth analysis of the variants affecting in the region than any other study.

Funding

None

Article activity feed

  1. SciScore for 10.1101/2020.10.09.332692: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    10 samples was selected as the optimum number to cover all possible countries and remain within alignment file limit of size 4 Mb (maximum size for Clustal Omega tool).
    Clustal Omega tool
    suggested: None
    The Clustal Omega online tool was used to perform the alignment (found at: https://www.ebi.ac.uk/Tools/msa/clustalo/).
    Clustal Omega
    suggested: (Clustal Omega, RRID:SCR_001591)
    SnpEff also predicts the effect of the variants.
    SnpEff
    suggested: (SnpEff, RRID:SCR_005191)
    To create the SnpEff database, we downloaded data from NCBI for Wuhan reference NC_045512.2 and uploaded to Galaxy.
    Galaxy
    suggested: (Galaxy, RRID:SCR_006281)
    The dplyr v0.8.4 package was used to summarise and align the data13.
    dplyr
    suggested: (dplyr, RRID:SCR_016708)
    The ggplot2 package was used to align the identified variants and visualise the types of mutations that re-occur14.
    ggplot2
    suggested: (ggplot2, RRID:SCR_014601)
    We first select the file generated using BEAST and outputted the tree file.
    BEAST
    suggested: (BEAST, RRID:SCR_010228)
    Then the output NEXUS file was imported to FigTree program to display.
    FigTree
    suggested: (FigTree, RRID:SCR_008515)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.