Phylogenetic clustering of the Indian SARS-CoV-2 genomes reveals the presence of distinct clades of viral haplotypes among states

Abstract

The first Indian cases of COVID-19 caused by SARS-Cov-2 were reported in February 29, 2020 with a history of travel from Wuhan, China and so far above 4500 deaths have been attributed to this pandemic. The objectives of this study were to characterize Indian SARS-CoV-2 genome-wide nucleotide variations, trace ancestries using phylogenetic networks and correlate state-wise distribution of viral haplotypes with differences in mortality rates. A total of 305 whole genome sequences from 19 Indian states were downloaded from GISAID. Sequences were aligned using the ancestral Wuhan-Hu genome sequence (NC_045512.2). A total of 633 variants resulting in 388 amino acid substitutions were identified. Allele frequency spectrum, and nucleotide diversity (π) values revealed the presence of higher proportions of low frequency variants and negative Tajima’s D values across ORFs indicated the presence of population expansion. Network analysis highlighted the presence of two major clusters of viral haplotypes, namely, clade G with the S:D614G, RdRp: P323L variants and a variant of clade L [L _v ] having the RdRp:A97V variant. Clade G genomes were found to be evolving more rapidly into multiple sub-clusters including clade GH and GR and were also found in higher proportions in three states with highest mortality rates namely, Gujarat, Madhya Pradesh and West Bengal.

Article activity feed

SciScore for 10.1101/2020.05.28.122143: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.
Cell Line Authentication	not detected.

Table 2: Resources

Experimental Models: Cell Lines
Sentences	Resources
Out of a total of 305 sequences, 26 were found without state information and 7 had been grown in Vero cells.	Vero suggested: CLS Cat# 605372/p622_VERO, RRID:CVCL_0059)
Software and Algorithms
Sentences	Resources
Multiple sequence alignment was executed using MUSCLE [11] with three iterations for both.	MUSCLE suggested: (MUSCLE, RRID:SCR_011812)
The SIFT database was used to identify amino acid changes that could protein function …

SciScore for 10.1101/2020.05.28.122143: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.
Cell Line Authentication	not detected.

Table 2: Resources

Experimental Models: Cell Lines
Sentences	Resources
Out of a total of 305 sequences, 26 were found without state information and 7 had been grown in Vero cells.	Vero suggested: CLS Cat# 605372/p622_VERO, RRID:CVCL_0059)
Software and Algorithms
Sentences	Resources
Multiple sequence alignment was executed using MUSCLE [11] with three iterations for both.	MUSCLE suggested: (MUSCLE, RRID:SCR_011812)
The SIFT database was used to identify amino acid changes that could protein function (http://blocks.fhcrc.org/sift/SIFT_seq_submit2.html) [12].	SIFT suggested: (SIFT, RRID:SCR_012813)
2.2 Measurements of diversity and deviation from neutrality: Watterson’s estimator (θw), nucleotide diversity (π) and Tajima’s D [13] for each open reading frame (ORF) was calculated using MEGA X [14].	MEGA suggested: (Mega BLAST, RRID:SCR_011920)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Sofia Herrera Agüero
Aldo Sosa
Alexander Martínez
Ambar Moreno
César Roberto Conde Pereira
Claudia Gonzalez
Claudio Soto Garita
Daniel Ulate
Estela Cordero-Laurent
Hebleen Brenes
Isaac Miguel Sánchez
Jairo Mendez-Rico
Jessica Góndola
Jose Arturo Molina-Mora
Juliana Leite
Leticia Franco
Linda Mendoza
Lionel Gresh
Lucia De La Cruz
Mitzi Castro Paz
Monica Barahona
Naomi Iihoshi
Oris Chavarria
Priscila Born
Ruby Melany Aguillón
Ruth Carolina Vasquez Cordova
Selene Gonzalez
Sofia Carolina Alvarado Silva
Xochitl Sandoval López
Yvonne Imbert
Francisco Duarte-Martínez

Dengue Virus Type 2: Global Epidemiology, Molecular Evolution, and Immune Response Insights

Qun Chen
Peipei Ye
Mengye Ma
Zhu Chen
Liming Jiang

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

Pulchérie Pelembi
Philippe Colson
Alain Farra
Ornella Anne Sibiro-Demi
Christian Noël Malaka
Aurélia Kwasiborski
Véronique Hourdel
Gilles Landry Ngaya
Romaric Nzoumbou-Boko
Jean-Claude Manuguerra
Emmanuel Ryvalin Nakoune-Yandoko
Guy VERNET
Bernard La Scola
Valérie Caro
Alexandre Manirakiza

Phylogenetic clustering of the Indian SARS-CoV-2 genomes reveals the presence of distinct clades of viral haplotypes among states

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Overview of SARS-CoV-2 Genomic Surveillance in Central America and the Dominican Republic from February 2020 to January 2023: The Impact of PAHO and COMISCA's Collaborative Efforts

Dengue Virus Type 2: Global Epidemiology, Molecular Evolution, and Immune Response Insights

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Overview of SARS-CoV-2 Genomic Surveillance in Central America and the Dominican Republic from February 2020 to January 2023: The Impact of PAHO and COMISCA's Collaborative Efforts

Dengue Virus Type 2: Global Epidemiology, Molecular Evolution, and Immune Response Insights

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.