A distinct phylogenetic cluster of Indian SARS-CoV-2 isolates

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

From an isolated epidemic, COVID-19 has now emerged as a global pandemic. The availability of genomes in the public domain following the epidemic provides a unique opportunity to understand the evolution and spread of the SARS-CoV-2 virus across the globe. The availability of whole genomes from multiple states in India prompted us to analyse the phylogenetic clusters of genomes in India. We performed whole-genome sequencing for 64 genomes making a total of 361 genomes from India, followed by phylogenetic clustering, substitution analysis, and dating of the different phylogenetic clusters of viral genomes. We describe a distinct phylogenetic cluster (Clade I / A3i) of SARS-CoV-2 genomes from India, which encompasses 41% of all genomes sequenced and deposited in the public domain from multiple states in India. Globally 3.5% of genomes, which till date could not be mapped to any distinct known cluster fall in this newly defined clade. The cluster is characterized by a core set of shared genetic variants – C6312A (T2016K), C13730T (A88V/A97V), C23929T, and C28311T (P13L). Further, the cluster is also characterized by a nucleotide substitution rate of 1.4 × 10 −3 variants per site per year, lower than the prevalent A2a cluster, and predominantly driven by variants in the E and N genes and relative sparing of the S gene. Epidemiological assessments suggest that the common ancestor emerged in the month of February 2020 and possibly resulted in an outbreak followed by countrywide spread, as evidenced by the low divergence of the genomes from across the country. To the best of our knowledge, this is the first comprehensive study characterizing the distinct and predominant cluster of SARS-CoV-2 in India.

Article activity feed

  1. SciScore for 10.1101/2020.05.31.126136: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Following rRNA depletion and fragmentation, RNA was converted to cDNA and processed for NGS library prep using TruSeq Stranded Total RNA library prep kit (
    NGS
    suggested: (PM4NGS, RRID:SCR_019164)
    Assembly of sequencing data: Raw image data was basecalled using bcl2fastq v2.20 with --no-lane-splitting option enabled.
    bcl2fastq
    suggested: (bcl2fastq , RRID:SCR_015058)
    QC of the FASTQ files was performed using FastQC v0.11.7, and adaptors/poor quality bases were trimmed using Trimmomatic.6,7 Reads were aligned to the reference genome MN908947.3 using hisat2.8 Consensus sequence from the bam file was derived using seqtk and bcftools.
    FastQC
    suggested: (FastQC, RRID:SCR_014583)
    9 Samtools depth command was used to calculate the coverage across the genome, with the options -a and -d 50000.10 The sequences were deposited in GISAID.
    Samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    For this, the individual samples were systematically aligned to the Wuhan-Hu-1 reference genome (NC_045512) using EMBOSS needle.
    EMBOSS
    suggested: (EMBOSS, RRID:SCR_008493)
    15 The sequences were aligned against the WH1 reference genome using MAFFT and spurious variant positions were masked from the alignment.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Divergence estimates and most recent common ancestor: The substitution rate of nCoV-2 was estimated using BEAST v1.10.4 following a methodology described previously.18,19 The masked alignment file was used as input to estimate a strict molecular clock and a coalescent growth rate model (exponential growth).
    BEAST
    suggested: (BEAST, RRID:SCR_010228)
    The log output of each MCMC run was analysed in Tracer v1.7.1.20 The priors used for each parameter in the analysis are given in Supplementary Data 4.
    Tracer
    suggested: (Tracer, RRID:SCR_019121)
    Functional evaluation of variants: The variants were also evaluated for the functional consequences using the Sorting Intolerant from Tolerant (SIFT).
    SIFT
    suggested: (SIFT, RRID:SCR_012813)
    The functional effects of protein variants identified in the clades were assessed using the PROVEAN web server, using the protein sequences of the Wuhan-Hu-1 genome as reference and a default threshold value of −2.5.22 Additionally, PhyloP conservation scores and base-wise GERP Rejected Substitutions scores (RS scores) for the variants were also computed.
    PROVEAN
    suggested: (PROVEAN, RRID:SCR_002182)
    The variants were also checked for overlaps with immune epitope predictions as given on UCSC Genome Browser for SARS-CoV-2.
    UCSC Genome Browser
    suggested: (UCSC Genome Browser, RRID:SCR_005780)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.