A Distinct Phylogenetic Cluster of Indian Severe Acute Respiratory Syndrome Coronavirus 2 Isolates

Abstract

Background

From an isolated epidemic, coronavirus disease 2019 has now emerged as a global pandemic. The availability of genomes in the public domain after the epidemic provides a unique opportunity to understand the evolution and spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus across the globe.

Methods

We performed whole-genome sequencing of 303 Indian isolates, and we analyzed them in the context of publicly available data from India.

Results

We describe a distinct phylogenetic cluster (Clade I/A3i) of SARS-CoV-2 genomes from India, which encompasses 22% of all genomes deposited in the public domain from India. Globally, approximately 2% of genomes, which to date could not be mapped to any distinct known cluster, fall within this clade.

Conclusions

The cluster is characterized by a core set of 4 genetic variants and has a nucleotide substitution rate of 1.1 × 10–3 variants per site per year, which is lower than the prevalent A2a cluster. Epidemiological assessments suggest that the common ancestor emerged at the end of January 2020 and possibly resulted in an outbreak followed by countrywide spread. To the best of our knowledge, this is the first comprehensive study characterizing this cluster of SARS-CoV-2 in India.

SciScore for 10.1101/2020.05.31.126136: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Following rRNA depletion and fragmentation, RNA was converted to cDNA and processed for NGS library prep using TruSeq Stranded Total RNA library prep kit (	NGS suggested: (PM4NGS, RRID:SCR_019164)
Assembly of sequencing data: Raw image data was basecalled using bcl2fastq v2.20 with --no-lane-splitting option enabled.	bcl2fastq suggested: (bcl2fastq , RRID:SCR_015058)
QC of the FASTQ files was performed using FastQC v0.11.7, and adaptors/poor quality bases were trimmed using Trimmomatic.6,7 Reads were aligned to the reference genome MN908947.3 using hisat2.8 Consensus sequence from the bam file …

SciScore for 10.1101/2020.05.31.126136: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Following rRNA depletion and fragmentation, RNA was converted to cDNA and processed for NGS library prep using TruSeq Stranded Total RNA library prep kit (	NGS suggested: (PM4NGS, RRID:SCR_019164)
Assembly of sequencing data: Raw image data was basecalled using bcl2fastq v2.20 with --no-lane-splitting option enabled.	bcl2fastq suggested: (bcl2fastq , RRID:SCR_015058)
QC of the FASTQ files was performed using FastQC v0.11.7, and adaptors/poor quality bases were trimmed using Trimmomatic.6,7 Reads were aligned to the reference genome MN908947.3 using hisat2.8 Consensus sequence from the bam file was derived using seqtk and bcftools.	FastQC suggested: (FastQC, RRID:SCR_014583)
9 Samtools depth command was used to calculate the coverage across the genome, with the options -a and -d 50000.10 The sequences were deposited in GISAID.	Samtools suggested: (SAMTOOLS, RRID:SCR_002105)
For this, the individual samples were systematically aligned to the Wuhan-Hu-1 reference genome (NC_045512) using EMBOSS needle.	EMBOSS suggested: (EMBOSS, RRID:SCR_008493)
15 The sequences were aligned against the WH1 reference genome using MAFFT and spurious variant positions were masked from the alignment.	MAFFT suggested: (MAFFT, RRID:SCR_011811)
Divergence estimates and most recent common ancestor: The substitution rate of nCoV-2 was estimated using BEAST v1.10.4 following a methodology described previously.18,19 The masked alignment file was used as input to estimate a strict molecular clock and a coalescent growth rate model (exponential growth).	BEAST suggested: (BEAST, RRID:SCR_010228)
The log output of each MCMC run was analysed in Tracer v1.7.1.20 The priors used for each parameter in the analysis are given in Supplementary Data 4.	Tracer suggested: (Tracer, RRID:SCR_019121)
Functional evaluation of variants: The variants were also evaluated for the functional consequences using the Sorting Intolerant from Tolerant (SIFT).	SIFT suggested: (SIFT, RRID:SCR_012813)
The functional effects of protein variants identified in the clades were assessed using the PROVEAN web server, using the protein sequences of the Wuhan-Hu-1 genome as reference and a default threshold value of −2.5.22 Additionally, PhyloP conservation scores and base-wise GERP Rejected Substitutions scores (RS scores) for the variants were also computed.	PROVEAN suggested: (PROVEAN, RRID:SCR_002182)
The variants were also checked for overlaps with immune epitope predictions as given on UCSC Genome Browser for SARS-CoV-2.	UCSC Genome Browser suggested: (UCSC Genome Browser, RRID:SCR_005780)

Results from OddPub: Thank you for sharing your data.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

A Distinct Phylogenetic Cluster of Indian Severe Acute Respiratory Syndrome Coronavirus 2 Isolates

This article has been Reviewed by the following groups

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

Study of SARS-CoV-2 genomes from Pakistan in the post-pandemic period 2023-2024 shows unique phylogenetic diversity as compared with other global data

Genomic epidemiology of dengue virus 2 and 3 reveals repeated introductions and exportations of several lineages in Colombia

Integrative genomic study of mutation dynamics and Evolutionary trends in SARS-CoV-2 omicron BA.3

This article has been Reviewed by the following groups

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

Related articles

Study of SARS-CoV-2 genomes from Pakistan in the post-pandemic period 2023-2024 shows unique phylogenetic diversity as compared with other global data

Genomic epidemiology of dengue virus 2 and 3 reveals repeated introductions and exportations of several lineages in Colombia

Integrative genomic study of mutation dynamics and Evolutionary trends in SARS-CoV-2 omicron BA.3