Analysis of Indian SARS-CoV-2 Genomes Reveals Prevalence of D614G Mutation in Spike Protein Predicting an Increase in Interaction With TMPRSS2 and Virus Infectivity

This article has been Reviewed by the following groups

Read the full article

Abstract

Coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus, has emerged as a global pandemic worldwide. In this study, we used ARTIC primers–based amplicon sequencing to profile 225 SARS-CoV-2 genomes from India. Phylogenetic analysis of 202 high-quality assemblies identified the presence of all the five reported clades 19A, 19B, 20A, 20B, and 20C in the population. The analyses revealed Europe and Southeast Asia as two major routes for introduction of the disease in India followed by local transmission. Interestingly, the19B clade was found to be more prevalent in our sequenced genomes (17%) compared to other genomes reported so far from India. Haplotype network analysis showed evolution of 19A and 19B clades in parallel from predominantly Gujarat state in India, suggesting it to be one of the major routes of disease transmission in India during the months of March and April, whereas 20B and 20C appeared to evolve from 20A. At the same time, 20A and 20B clades depicted prevalence of four common mutations 241 C > T in 5′ UTR, P4715L, F942F along with D614G in the Spike protein. D614G mutation has been reported to increase virus shedding and infectivity. Our molecular modeling and docking analysis identified that D614G mutation resulted in enhanced affinity of Spike S1–S2 hinge region with TMPRSS2 protease, possibly the reason for increased shedding of S1 domain in G614 as compared to D614. Moreover, we also observed an increased concordance of G614 mutation with the viral load, as evident from decreased Ct value of Spike and the ORF1ab gene.

Article activity feed

  1. SciScore for 10.1101/2020.07.23.217430: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Raw data pre-processing: Quality of the sequenced files were checked using FastQC tool (0.11.9) [11], followed by removal of low quality bases (--nextseq-trim, Q<20), Illumina Universal adapter sequence and reads with less than 30bp length using Cutadapt(2.10) [12].
    FastQC
    suggested: (FastQC, RRID:SCR_014583)
    All the files were then aligned to human genome (assembly versionGRCh38) using HISAT2 (2.2.0)[15] and unmapped reads were extracted using SAMTOOLS(1.10)[16] and converted to FASTQ format using BEDTOOLS(2.29.2) [17] bamToFastq option.
    HISAT2
    suggested: (HISAT2, RRID:SCR_015530)
    The aligned files were then de-duplicated using Picard Tools (2.18.7, https://broadinstitute.github.io/picard/).
    https://broadinstitute.github.io/picard/
    suggested: (Picard, RRID:SCR_006525)
    Alignment quality was checked using SAMTOOLS(1.10)[16] flagstat option.
    SAMTOOLS(1.10)
    suggested: None
    Effect of the filtered variants were annotated using SnpEff (4.5)[20].
    SnpEff
    suggested: (SnpEff, RRID:SCR_005191)
    The loop in spike protein (670-690) was refined using loop modelling procedure in Modeller by generating 100 loop models.
    Modeller
    suggested: (MODELLER, RRID:SCR_008395)
    The best-ranked docking pose was visualized using Pymol.
    Pymol
    suggested: (PyMOL, RRID:SCR_000305)
    Statistical analysis and plotting: All the statistical analysis and plots were generated in R (3.6.1) statistical programming language using ggplot2, dplyr, reshape2, lubridate, ggsci and ggpubr package available from CRAN and Bioconductor (https://CRAN.R-project.org/package=tidyvers) repository.
    ggplot2
    suggested: (ggplot2, RRID:SCR_014601)
    CRAN
    suggested: (CRAN, RRID:SCR_003005)
    Bioconductor
    suggested: (Bioconductor, RRID:SCR_006442)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.