Phylogenetic Analysis of the Novel Coronavirus Reveals Important Variants in Indian Strains

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Recently classified as a pandemic by WHO, novel Corononavirus 2019 has affected almost every corner of the globe causing human deaths in a range of hundred thousands. The virus having its roots in Wuhan (China) has been spread over the world by its own property to change itself accordingly. These changes correspond to its transmission and pathogenicity due to which the concept of social distancing appeared into the picture. In this paper, a few findings from the whole genome sequence analysis of viral genome sequences submitted from India are presented. The data used for analysis comprises 440 collective genome sequences of virus submitted in GenBank, GISAID, and SRA projects, from around the world as well as 28 viral sequences from India. Multiple sequence alignment of all genome sequences was performed and analysed. A novel non-synonymous mutation 4809C>T (S1515F) in NSP3 gene of SARS-CoV2 Indian strains is reported along with other frequent and important changes from around the world: 3037C>T, 14408C>T, and 23403A>G. The novel change was observed in samples collected in the month of March, whereas was found to be absent in samples collected in January with the respective persons’ travel history to China. Phylogenetic analysis clustered the sequences with this change as one separate clade. Mutation was predicted as stabilising change by insilco tool DynaMut. A second patient in the world to our knowledge with multiple (Wuhan and USA) strain contraction was observed in this study. The infected person is among the two early infected patients with travel history to China. Strains sequenced in Iran stood out to have different variants, as most of the reported frequent variants were not observed. The objective of this paper is to highlight the similarities and changes observed in the submitted Indian viral strains. This helps to keep track on the activity, that how virus is changing into a new subtype. Major strains observed were European with the novel change in India and other being emergent clade of Iran. Its important to observe the changes in NSP3 gene, as this gene has been reported with extensive positive selection as well as potential drug target. Extensive Positive Selection Drives the Evolution of Nonstructural Proteins. With the limited number of sequences this was the only frequent novel non-synonymous change observed from Indian strains, thereby making this change vulnerable for investigation in future. This paper has a special focus on tracking of Indian viral sequences submitted in public domain.

Article activity feed

  1. SciScore for 10.1101/2020.04.14.041301: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    SRA project sequences were aligned to human genome assembly (hg19) using HiSat2 [10], to remove the human contamination in viral sequences.
    HiSat2
    suggested: (HISAT2, RRID:SCR_015530)
    Next the unaligned region to hg19 was aligned to SARS-CoV2 ref sequence (NC_045512) using SamTools [13] and HiSat2.
    SamTools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    Consensus fasta sequence of four viral strains of project were obtained using BcfTools [13].
    BcfTools
    suggested: (SAMtools/BCFtools, RRID:SCR_005227)
    Multiple Sequence alignment was performed on all 468 sequences using MAFFT (Multiple Alignment using Fast Fourier Transform).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Alignment was visualised in JalView 2.11.0.
    JalView
    suggested: (Jalview, RRID:SCR_006459)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Though they had the limitations and false positives also, but its nonetheless important to consider this domain as potential drug target against SARS-CoV2. This change could be viewed as potential next change a virus might have incorporated to form a new subtype in India as a response to positive selection. Second probable reason could be that the other domains from SUD-SARS-CoV (SUD-N and SUD-M) have been reported to have effect on their binding with G-quadruplexes in human host upon mutations [12]. Thereby change in this SUD-C might have been beneficial, not affecting much the response of virus against host but providing some kind of fitness with the replication or transcription mechanism. As mentioned earlier SUD-C and combination of SUD-MC are involved in RNA binding process [9]. Further biochemical validation or insilico docking analysis are required to observe the effect of this change on RNA-binding. With earlier reported important mutations 241-C>T, 3037C>T, 14408C>T, 23403A>G this additional novel mutation has been added 241C>T, 3037-C>T, 14408C>T, 23403A>G, 4809C>T presented in Figure 4. As discerned with reported frequent mutations in Europeans, the inference could be made that these mutations could be correlated to efficiency of viral transmission, since Europe has emerged with most adverse effects by the virus. The virus have been reported with slow changes and thereby found to be circulating in few major forms. Looking at the data from Indian strains it has majo...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.