A Distinct Phylogenetic Cluster of Indian Severe Acute Respiratory Syndrome Coronavirus 2 Isolates

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

From an isolated epidemic, coronavirus disease 2019 has now emerged as a global pandemic. The availability of genomes in the public domain after the epidemic provides a unique opportunity to understand the evolution and spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus across the globe.

Methods

We performed whole-genome sequencing of 303 Indian isolates, and we analyzed them in the context of publicly available data from India.

Results

We describe a distinct phylogenetic cluster (Clade I/A3i) of SARS-CoV-2 genomes from India, which encompasses 22% of all genomes deposited in the public domain from India. Globally, approximately 2% of genomes, which to date could not be mapped to any distinct known cluster, fall within this clade.

Conclusions

The cluster is characterized by a core set of 4 genetic variants and has a nucleotide substitution rate of 1.1 × 10–3 variants per site per year, which is lower than the prevalent A2a cluster. Epidemiological assessments suggest that the common ancestor emerged at the end of January 2020 and possibly resulted in an outbreak followed by countrywide spread. To the best of our knowledge, this is the first comprehensive study characterizing this cluster of SARS-CoV-2 in India.

Article activity feed

  1. SciScore for 10.1101/2020.05.31.126136: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Following rRNA depletion and fragmentation, RNA was converted to cDNA and processed for NGS library prep using TruSeq Stranded Total RNA library prep kit (
    NGS
    suggested: (PM4NGS, RRID:SCR_019164)
    Assembly of sequencing data: Raw image data was basecalled using bcl2fastq v2.20 with --no-lane-splitting option enabled.
    bcl2fastq
    suggested: (bcl2fastq , RRID:SCR_015058)
    QC of the FASTQ files was performed using FastQC v0.11.7, and adaptors/poor quality bases were trimmed using Trimmomatic.6,7 Reads were aligned to the reference genome MN908947.3 using hisat2.8 Consensus sequence from the bam file was derived using seqtk and bcftools.
    FastQC
    suggested: (FastQC, RRID:SCR_014583)
    9 Samtools depth command was used to calculate the coverage across the genome, with the options -a and -d 50000.10 The sequences were deposited in GISAID.
    Samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    For this, the individual samples were systematically aligned to the Wuhan-Hu-1 reference genome (NC_045512) using EMBOSS needle.
    EMBOSS
    suggested: (EMBOSS, RRID:SCR_008493)
    15 The sequences were aligned against the WH1 reference genome using MAFFT and spurious variant positions were masked from the alignment.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Divergence estimates and most recent common ancestor: The substitution rate of nCoV-2 was estimated using BEAST v1.10.4 following a methodology described previously.18,19 The masked alignment file was used as input to estimate a strict molecular clock and a coalescent growth rate model (exponential growth).
    BEAST
    suggested: (BEAST, RRID:SCR_010228)
    The log output of each MCMC run was analysed in Tracer v1.7.1.20 The priors used for each parameter in the analysis are given in Supplementary Data 4.
    Tracer
    suggested: (Tracer, RRID:SCR_019121)
    Functional evaluation of variants: The variants were also evaluated for the functional consequences using the Sorting Intolerant from Tolerant (SIFT).
    SIFT
    suggested: (SIFT, RRID:SCR_012813)
    The functional effects of protein variants identified in the clades were assessed using the PROVEAN web server, using the protein sequences of the Wuhan-Hu-1 genome as reference and a default threshold value of −2.5.22 Additionally, PhyloP conservation scores and base-wise GERP Rejected Substitutions scores (RS scores) for the variants were also computed.
    PROVEAN
    suggested: (PROVEAN, RRID:SCR_002182)
    The variants were also checked for overlaps with immune epitope predictions as given on UCSC Genome Browser for SARS-CoV-2.
    UCSC Genome Browser
    suggested: (UCSC Genome Browser, RRID:SCR_005780)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.