Variant analysis of SARS-CoV-2 genomes in the Middle East
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
Background
Coronavirus (COVID-19) was introduced into society in late 2019 and has now reached over 26 million cases and 850,000 deaths. The Middle East has a death toll of ∼50,000 and over 20,000 of these are in Iran, which has over 350,000 confirmed cases. We expect that Iranian cases caused outbreaks in the neighbouring countries and that variant mapping and phylogenetic analysis can be used to prove this. We also aim to analyse the variants of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) to characterise the common genome variants and provide useful data in the global effort to prevent further spread of COVID-19.
Methods
The approach uses bioinformatics approaches including multiple sequence alignment, variant calling and annotation and phylogenetic analysis to identify the genomic variants found in the region. The approach uses 122 samples from the 13 countries of the Middle East sourced from the Global Initiative on Sharing All Influenza Data (GISAID).
Findings
We identified 2200 distinct genome variants including 129 downstream gene variants, 298 frame shift variants, 789 missense variants, 1 start lost, 13 start gained, 1 stop lost, 249 synonymous variants and 720 upstream gene variants. The most common, high impact variants were 10818delTinsG, 2772delCinsC, 14159delCinsC and 2789delAinsA. Variant alignment and phylogenetic tree generation indicates that samples from Iran likely introduced COVID-19 to the rest of the Middle East.
Interpretation
The phylogenetic and variant analysis provides unique insight into mutation types in genomes. Initial introduction of COVID-19 was most likely due to Iranian transmission. Some countries show evidence of novel mutations and unique strains. Increased time in small populations is likely to contribute to more unique genomes. This study provides more in depth analysis of the variants affecting in the region than any other study.
Funding
None
Article activity feed
-
SciScore for 10.1101/2020.10.09.332692: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources 10 samples was selected as the optimum number to cover all possible countries and remain within alignment file limit of size 4 Mb (maximum size for Clustal Omega tool). Clustal Omega toolsuggested: NoneThe Clustal Omega online tool was used to perform the alignment (found at: https://www.ebi.ac.uk/Tools/msa/clustalo/). Clustal Omegasuggested: (Clustal Omega, RRID:SCR_001591)SnpEff also predicts the effect of the variants. SnpEffsuggested: (SnpEff, RRID:SCR_005191)To create the SnpEff database, we downloaded data from NCBI for Wuhan reference NC_045512.2 and uploaded to Galaxy. Galaxysuggested:…SciScore for 10.1101/2020.10.09.332692: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources 10 samples was selected as the optimum number to cover all possible countries and remain within alignment file limit of size 4 Mb (maximum size for Clustal Omega tool). Clustal Omega toolsuggested: NoneThe Clustal Omega online tool was used to perform the alignment (found at: https://www.ebi.ac.uk/Tools/msa/clustalo/). Clustal Omegasuggested: (Clustal Omega, RRID:SCR_001591)SnpEff also predicts the effect of the variants. SnpEffsuggested: (SnpEff, RRID:SCR_005191)To create the SnpEff database, we downloaded data from NCBI for Wuhan reference NC_045512.2 and uploaded to Galaxy. Galaxysuggested: (Galaxy, RRID:SCR_006281)The dplyr v0.8.4 package was used to summarise and align the data13. dplyrsuggested: (dplyr, RRID:SCR_016708)The ggplot2 package was used to align the identified variants and visualise the types of mutations that re-occur14. ggplot2suggested: (ggplot2, RRID:SCR_014601)We first select the file generated using BEAST and outputted the tree file. BEASTsuggested: (BEAST, RRID:SCR_010228)Then the output NEXUS file was imported to FigTree program to display. FigTreesuggested: (FigTree, RRID:SCR_008515)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
