Large scale genomic analysis of 3067 SARS-CoV-2 genomes reveals a clonal geo-distribution and a rich genetic variations of hotspots mutations

Abstract

In late December 2019, an emerging viral infection COVID-19 was identified in Wuhan, China, and became a global pandemic. Characterization of the genetic variants of SARS-CoV-2 is crucial in following and evaluating it spread across countries. In this study, we collected and analyzed 3,067 SARS-CoV-2 genomes isolated from 55 countries during the first three months after the onset of this virus. Using comparative genomics analysis, we traced the profiles of the whole-genome mutations and compared the frequency of each mutation in the studied population. The accumulation of mutations during the epidemic period with their geographic locations was also monitored. The results showed 782 variant sites, of which 512 (65.47%) had a non-synonymous effect. Frequencies of mutated alleles revealed the presence of 38 recurrent non-synonymous mutations, including ten hotspot mutations with a prevalence higher than 0.10 in this population and distributed in six SARS-CoV-2 genes. The distribution of these recurrent mutations on the world map revealed certain genotypes specific to the geographic location. We also found co-occurring mutations resulting in the presence of several haplotypes. Moreover, evolution over time has shown a mechanism of mutation co-accumulation which might affect the severity and spread of the SARS-CoV-2.

On the other hand, analysis of the selective pressure revealed the presence of negatively selected residues that could be taken into considerations as therapeutic targets

We have also created an inclusive unified database ( http://genoma.ma/covid-19/ ) that lists all of the genetic variants of the SARS-CoV-2 genomes found in this study with phylogeographic analysis around the world.

Article activity feed

SciScore for 10.1101/2020.05.03.074567: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
The BAM files were sorted by SAMtools sort (9), then used to call the genetic variants in variant call format (VCF) by SAMtools mpileup (9) and bcftools v1.8 (9).	SAMtools suggested: (SAMTOOLS, RRID:SCR_002105)
The final call set of the 3067 genomes, was annotated and their impact was predicted using SnpEff v 4.3t (10).	SnpEff suggested: (SnpEff, RRID:SCR_005191)
First, the SnpEff databases were built locally using annotations of the reference genome NC_045512.2 obtained in GFF format from the NCBI database.	NCBI suggested: (NCBI, RRID:SCR_006472)
Phylogentic analysis and geodistribution: The …

SciScore for 10.1101/2020.05.03.074567: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
The BAM files were sorted by SAMtools sort (9), then used to call the genetic variants in variant call format (VCF) by SAMtools mpileup (9) and bcftools v1.8 (9).	SAMtools suggested: (SAMTOOLS, RRID:SCR_002105)
The final call set of the 3067 genomes, was annotated and their impact was predicted using SnpEff v 4.3t (10).	SnpEff suggested: (SnpEff, RRID:SCR_005191)
First, the SnpEff databases were built locally using annotations of the reference genome NC_045512.2 obtained in GFF format from the NCBI database.	NCBI suggested: (NCBI, RRID:SCR_006472)
Phylogentic analysis and geodistribution: The downloaded full-length genome sequences of coronaviruses isolated from different hosts from public databases were subjected to multiple sequence alignments using Muscle v 3.8 (	Muscle suggested: (MUSCLE, RRID:SCR_011812)
Maximum-likelihood phylogenetic trees with 1000 bootstrap replicates were constructed using RaxML v 8.2.12 (39)).	RaxML suggested: (RAxML, RRID:SCR_006086)
Selective pressure and modelling: We used Hyphy v2.5.8 (13) to estimate synonymous and non-synonymous ratio dN / dS (ω).	Hyphy suggested: (HyPhy, RRID:SCR_016162)
The selected nucleotide sequences of each dataset were aligned using Clustalw codon-by-codon and the phylogenetic tree was obtained using ML (maximum likelihood) available in MEGA X (14).	Clustalw suggested: (ClustalW, RRID:SCR_017277) MEGA suggested: (Mega BLAST, RRID:SCR_011920)
Structure visualization and image rendering were performed in PyMOL 2.3 (Schrodinger LLC).	PyMOL suggested: (PyMOL, RRID:SCR_000305)
The strategy of best reciprocal BLAST results (18) was implemented to identify all of the orthologous genes using Proteinortho v6.0b (19)	BLAST suggested: (BLASTX, RRID:SCR_001653)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Pulchérie Pelembi
Philippe Colson
Alain Farra
Ornella Anne Sibiro-Demi
Christian Noël Malaka
Aurélia Kwasiborski
Véronique Hourdel
Gilles Landry Ngaya
Romaric Nzoumbou-Boko
Jean-Claude Manuguerra
Emmanuel Ryvalin Nakoune-Yandoko
Guy VERNET
Bernard La Scola
Valérie Caro
Alexandre Manirakiza

Overview of SARS-CoV-2 Genomic Surveillance in Central America and the Dominican Republic from February 2020 to January 2023: The Impact of PAHO and COMISCA's Collaborative Efforts

Sofia Herrera Agüero
Aldo Sosa
Alexander Martínez
Ambar Moreno
César Roberto Conde Pereira
Claudia Gonzalez
Claudio Soto Garita
Daniel Ulate
Estela Cordero-Laurent
Hebleen Brenes
Isaac Miguel Sánchez
Jairo Mendez-Rico
Jessica Góndola
Jose Arturo Molina-Mora
Juliana Leite
Leticia Franco
Linda Mendoza
Lionel Gresh
Lucia De La Cruz
Mitzi Castro Paz
Monica Barahona
Naomi Iihoshi
Oris Chavarria
Priscila Born
Ruby Melany Aguillón
Ruth Carolina Vasquez Cordova
Selene Gonzalez
Sofia Carolina Alvarado Silva
Xochitl Sandoval López
Yvonne Imbert
Francisco Duarte-Martínez

Dengue Virus Type 2: Global Epidemiology, Molecular Evolution, and Immune Response Insights

Qun Chen
Peipei Ye
Mengye Ma
Zhu Chen
Liming Jiang

Large scale genomic analysis of 3067 SARS-CoV-2 genomes reveals a clonal geo-distribution and a rich genetic variations of hotspots mutations

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

Overview of SARS-CoV-2 Genomic Surveillance in Central America and the Dominican Republic from February 2020 to January 2023: The Impact of PAHO and COMISCA's Collaborative Efforts

Dengue Virus Type 2: Global Epidemiology, Molecular Evolution, and Immune Response Insights

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

Overview of SARS-CoV-2 Genomic Surveillance in Central America and the Dominican Republic from February 2020 to January 2023: The Impact of PAHO and COMISCA's Collaborative Efforts

Dengue Virus Type 2: Global Epidemiology, Molecular Evolution, and Immune Response Insights