Phylogenomic analysis of SARS-CoV-2 genomes from western India reveals unique linked mutations

This article has been Reviewed by the following groups

Read the full article

Abstract

India has become the third worst-hit nation by the COVID-19 pandemic caused by the SARS-CoV-2 virus. Here, we investigated the molecular, phylogenomic, and evolutionary dynamics of SARS-CoV-2 in western India, the most affected region of the country. A total of 90 genomes were sequenced. Four nucleotide variants, namely C241T, C3037T, C14408T (Pro4715Leu), and A23403G (Asp614Gly), located at 5’UTR, Orf1a, Orf1b, and Spike protein regions of the genome, respectively, were predominant and ubiquitous (90%). Phylogenetic analysis of the genomes revealed four distinct clusters, formed owing to different variants. The major cluster (cluster 4) is distinguished by mutations C313T, C5700A, G28881A are unique patterns and observed in 45% of samples. We thus report a newly emerging pattern of linked mutations. The predominance of these linked mutations suggests that they are likely a part of the viral fitness landscape. A novel and distinct pattern of mutations in the viral strains of each of the districts was observed. The Satara district viral strains showed mutations primarily at the 3′ end of the genome, while Nashik district viral strains displayed mutations at the 5′ end of the genome. Characterization of Pune strains showed that a novel variant has overtaken the other strains. Examination of the frequency of three mutations i.e., C313T, C5700A, G28881A in symptomatic versus asymptomatic patients indicated an increased occurrence in symptomatic cases, which is more prominent in females. The age-wise specific pattern of mutation is observed. Mutations C18877T, G20326A, G24794T, G25563T, G26152T, and C26735T are found in more than 30% study samples in the age group of 10-25. Intriguingly, these mutations are not detected in the higher age range 61-80. These findings portray the prevalence of unique linked mutations in SARS-CoV-2 in western India and their prevalence in symptomatic patients.

Importance

Elucidation of the SARS-CoV-2 mutational landscape within a specific geographical location, and its relationship with age and symptoms, is essential to understand its local transmission dynamics and control. Here we present the first comprehensive study on genome and mutation pattern analysis of SARS-CoV-2 from the western part of India, the worst affected region by the pandemic. Our analysis revealed three unique linked mutations, which are prevalent in most of the sequences studied. These may serve as a molecular marker to track the spread of this viral variant to different places.

Article activity feed

  1. SciScore for 10.1101/2020.07.30.228460: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board StatementIRB: Ethical clearance was taken from the Institutional ethical committee for the present study.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Data analysis: Fastqc tool was used to check the quality of the raw paired-end sequences after sequencing (Andrews 2010).
    Fastqc
    suggested: (FastQC, RRID:SCR_014583)
    Adapter sequences and poor quality sequences were removed, and good quality sequences (Q>30) were selected using Trimmomatic (Bolger et al., 2014) for further analysis.
    Trimmomatic
    suggested: (Trimmomatic, RRID:SCR_011848)
    Reference-based genome assembly was done using BWA (Burrows-Wheeler Aligner; Li 2013) to generate the consensus sequence.
    BWA
    suggested: (BWA, RRID:SCR_010910)
    The protein variants identified in the clade were assessed to know their functional effects using PROVEAN (Protein Variation Effect Analyzer) program, considering the protein sequences of the Wuhan-Hu-1 genome as reference and a default threshold value of −2.5 (Choi and Chan, 2015).
    PROVEAN
    suggested: (PROVEAN, RRID:SCR_002182)
    Structural and bioinformatics analysis of SARS-CoV-2 variants: Multiple sequence alignment: ClustalOmega (Sievers et al., 2011) and MUSCLE (Edgar 2004) as multiple sequence alignment tool were used to align protein specific regions.
    MUSCLE
    suggested: (MUSCLE, RRID:SCR_011812)
    Structural mapping and analysis of mutations was carried out in PYMOL (DeLano et al., 2002).
    PYMOL
    suggested: (PyMOL, RRID:SCR_000305)
    In this pipeline, all the sequences, including our study samples, were aligned using MAFFT (Multiple alignments using fast Fourier transform) (Katoh and Toh, 2008).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.