Demographic Analysis of Mutations in Indian SARS-CoV-2 Isolates

This article has been Reviewed by the following groups

Read the full article

Abstract

In this study we carried out the early distribution of clades and subclades state-wise based on shared mutations in Indian SARS-CoV-2 isolates collected (27 th Jan – 27 th May 2020). Phylogenetic analysis of these isolates indicates multiple independent sources of introduction of the virus in the country, while principal component analysis revealed some state-specific clusters. It is observed that clade 20A defining mutations C241T (ORF1ab: 5’ UTR), C3037T (ORF1ab: F924F), C14408T (ORF1ab: P4715L), and A23403G (S: D614G) are predominant in Indian isolates during this period. Higher number of coronavirus cases were observed in certain states, viz ., Delhi, Tamil Nadu, and Telangana. Genetic analysis of isolates from these states revealed a cluster with shared mutations, C6312A (ORF1ab: T2016K), C13730T (ORF1ab: A4489V), C23929T, and C28311T (N: P13L). Analysis of region-specific shared mutations carried out to understand the large number of deaths in Gujarat and Maharashtra identified shared mutations defining subclade, I/GJ-20A (C18877T, C22444T, G25563T (ORF3a: H57Q), C26735T, C28854T (N: S194L), C2836T) in Gujarat and two sets of co-occurring mutations C313T, C5700A (ORF1ab: A1812D) and A29827T, G29830T in Maharashtra. From the genetic analysis of mutation spectra of Indian isolates, the insights gained in its transmission, geographic distribution, containment, and impact are discussed.

Article activity feed

  1. Discussion, revision and decision


    Discussion and Revision


    Author response

    We would like to thank the reviewers for their valuable comments. Below we provide pointwise response and the changes made in the revised manuscript.

    To Dr. Jyotsnamayee Sabat

    Pt-13: I want to know how the representative sequences were selected for different states. Is it based on no. of sequences submitted or positivity rate of a particular region?

    All the Indian isolates available in GISAID for the period 27th Jan – 27th May 2020 were download and considered for analysis. NO state-wise selection was done.

    To Dr Parvin Abraham

    Pt-12: The dataset is only from 27th Jan – 27th May 2020. Maybe they can include more Numbers.

    The period of data collection was restricted to 27th Jan – 27th May 2020 to basically understand the variations observed across different states of the country during the early phase of pandemic. Also, we are interested in assessing the impact of lockdown in containing the spread of COVID19 and state-specific subclusters, if any.

    To Hurng-Yi Wang:

    Pt-13: Agarwal and Parekh analyzed 685 SARS-CoV-2 isolates collected during 27th Jan - 27th May 2020 from India and described the distribution of virus strains and mutations across the country. While the information might be valuable to some local readers, the results are mainly descriptive and the data are a bit out of date. In addition, I have the following comments.

    The period of data collection is restricted to 27th Jan – 27th May 2020 to basically understand the variations observed across different states of the country during the early phase of pandemic. Also, we are interested in assessing the impact of lockdown in containing the spread of COVID19 and state-specific subclusters, if any.

    1. Some details of the methods are lacking. For example, the MUpro provides two methods, it is necessary to specify which method was used in the analysis. The confidence score of each prediction should also be provided. Besides, some results from I-Mutant and MUpro were conflicting, the authors may want to discuss the discrepancy.

    In the revised manuscript we give the sign of DDG predicted using the tools I-Mutant2.0 and the MUpro along with the respective confidence scores. In I-Mutant2.0, the sign of protein stability change predicted and reliability index (which provides confidence to the prediction) are now incorporated in Table-1. Similarly, the sign change and confidence scores given by MUpro on using SVM and NN based models have been incorporated. We expect all the models to give same results, except in cases where the predictions may be hard to make. This has now been explicitly mentioned in the Materials and Methods section: “In I-Mutant2.0, the sign of DDG is based on SVM classifier, and the associated confidence score is given by the reliability index. On the other hand, MUpro provides sign change prediction using two models, one SVM-based and the other using Neural Networks. InTable-1, the predicted sign of DDG by I-Mutant2.0 and MuPRO along with the respective confidence scores is reported.”

    1. The “Analysis of the Mutational Profile of Indian Isolates” should be moved to Materials and Methods.

    There indeed was some redundancy in the information available in the Materials and Methods section and in the section “Analysis of the Mutational Profile of Indian Isolates”. We have now edited the Materials and Methods section appropriately and deleted the para under the above- mentioned section.

    1. The authors provided lengthy discussion about the effect of each mutation in some lineages, such as 20A and I/A3i. However, as these mutations are tightly linked, the effect of each individual mutation is difficult to access. It is possible that some of the mutations are just hitchhikers. They may want to address this alternative point.

    For 20A we define the haplotype comprising four co-occurring mutations D614G, C241T, C3037T, and C14408T. Similarly, six co-occurring mutations C6312A, C13730T, C23929T, C28311T, C6310A (S2015R) and C19524T are shown to be associated with subclade I/A3i. Together as a set, these are useful in identifying clusters or group of isolates with similar mutational profile. However, those that are non-synonymous mutations are likely to have some individual impact on the overall stability of the respective protein. And so, we have presented both these results. To address this point, we have added a sentence at the end of Materials and Methods section and is reproduced below: “While we report individual effects of mutations on protein stability, some of the mutations in a haplotype may not be under natural selection and are just hitchhiking mutations.”

    1. Several figures are confusing and lack detail. The diversity plots of Figure 3 and Figure 8 are hard to be precisely compared to the mutations that occurred among different plots. Phylogenetic trees, as well as their figure legends, are confusing, especially Figure 9 and Figure 10. For Figure 9, it is impossible to tell which mutation site had changed from C to T. For Figure 10, spots depicted in yellow are both position 29827 A>T and position 29830 G>T, green spot only notes as G, but A29827 is not mentioned in the figure. Furthermore, the mutation position of blue spot C cannot be found.

    We have now redrawn the diversity plots in Figure 3 and Figure 8, (labelled Figure 2 and Figure 4, respectively, in the revised manuscript) and are shown below. We have introduced horizontal lines to show the height of the divergence line at variant positions discussed in the manuscript, and these are also marked with the same colour in corresponding subplots for comparison.

    In the revised manuscript, Figures 9 and 10 are now Supplementary Figures 2c and 2d respectively. The new figure legends are: Supplementary Figure 2: The sequences carrying the mutations a) C5700A b) C23929T c) C18877T d) G29830T are depicted in yellow colour. Figure 10 (now Supplementary Figure 2(d)) is now re-plotted, and we have removed the blue dot corresponding to ‘C’ since no samples from India had this variation.

    1. Figure 9 and Figure 10 were not mentioned inside the text.

    It has now been added in the manuscript: Supplementary Figure 2(c) – On Pg-9, in the first line under the heading “Identification of novel subclade I/GJ-20A and unique mutations in Maharashtra”. Supplementary Figure 2(d) – On Pg-11, in the last paragraph under the heading “Identification of novel subclade I/GJ-20A and unique mutations in Maharashtra”.

    1. The Top 10 mutations in PCA analysis are the mutations in 20A and I/A3i. It is reasonable to observed a clear association of the clusters with the clades. It is not clear, however, how these distribution correlate with lockdown, contact tracing and quarantine measures.

    From Supplementary Figure 1 clade 20A (shown in ‘Green’) is predominantly observed in Gujarat (178/201) and the distribution of clade 19A (shown in ‘Blue’) is high in Telangana (75/97), followed by Delhi (55/76), Maharashtra (31/80), and Tamil Nadu (19/34). Four mutations, C6312A, C13730T, C23929T, and C28311T are reported to be associated with subclade I/A3i, which is India-specific subclade of 19A. These co-occurring mutations are found in ~32% of Indian samples sequenced (till 31st May 2020). Only 5 isolates of this subclade were observed after May in India with the last one dated 13th June 2020 (according to data available in Nextstrain). This indicates that the spread of subclade I/A3i had been largely contained during lockdown with efforts of contact tracing and quarantining the infected individuals. Also, Telangana and Delhi isolates cluster together due to shared I/A3i mutations, primarily due to the Tablighi Jamaat congregation that occurred just before lockdown was announced. Similarly, clade 20A defining mutations were observed to occur in ~ 90% of Gujarat samples. Due to the countrywide lockdown from 25th March 2020, this clade and its sub-clusters were localized in the state, defined by Gujarat-specific mutations, e.g., I/GJ-20A.


    Hurng-Yi Wang:

    I agree to change to Verified manuscript.


    Decision

    Verified manuscript

    Dr. Abraham: Verified manuscript

    Dr. Sabat: Verified manuscript

    Dr. Wang: Verified manuscript

  2. Peer review report

    Reviewer: Hurng-Yi Wang Institution: Institute of Ecology and Evolutionary Biology, National Taiwan University email: hurngyi@gmail.com


    Section 1 – Serious concerns

    • Do you have any serious concerns about the manuscript such as fraud, plagiarism, unethical or unsafe practices? No
    • Have authors’ provided the necessary ethics approval (from authors’ institution or an ethics committee)? not applicable

    Section 2 – Language quality

    • How would you rate the English language quality? Medium quality

    Section 3 – validity and reproducibility

    • Does the work cite relevant and sufficient literature? No
    • Is the study design appropriate and are the methods used valid? No
    • Are the methods documented and analysis provided so that the study can be replicated? Yes
    • Is the source data that underlies the result available so that the study can be replicated? Yes
    • Is the statistical analysis and its interpretation appropriate? Yes
    • Is quality of the figures and tables satisfactory? No
    • Are the conclusions adequately supported by the results? No
    • Are there any objective errors or fundamental flaws that make the research invalid?

    Section 4 – Suggestions

    • In your opinion how could the author improve the study?

    • Do you have any other feedback or comments for the Author?

    Agarwal and Parekh analyzed 685 SARS-CoV-2 isolates collected during 27th Jan - 27th May 2020 from India and described the distribution of virus strains and mutations across the country. While the information might be valuable to some local readers, the results are mainly descriptive and the data are a bit out of date. In addition, I have the following comments.

    1. Some details of the methods are lacking. For example, the MUpro provides two methods, it is necessary to specify which method was used in the analysis. The confidence score of each prediction should also be provided. Besides, some results from I-Mutant and MUpro were conflicting, the authors may want to discuss the discrepancy.
    1. The “Analysis of the Mutational Profile of Indian Isolates” should be moved to Materials and Methods.
    1. The authors provided lengthy discussion about the effect of each mutation in some lineages, such as 20A and I/A3i. However, as these mutations are tightly linked, the effect of each individual mutation is difficult to access. It is possible that some of the mutations are just hitchhikers. They may want to address this alternative point.
    1. Several figures are confusing and lack detail. The diversity plots of Figure 3 and Figure 8 are hard to be precisely compared to the mutations that occurred among different plots. Phylogenetic trees, as well as their figure legends, are confusing, especially Figure 9 and Figure 10. For Figure 9, it is impossible to tell which mutation site had changed from C to T. For Figure 10, spots depicted in yellow are both position 29827 A>T and position 29830 G>T, green spot only notes as G, but A29827 is not mentioned in the figure. Furthermore, the mutation position of blue spot C cannot be found.
    1. Figure 9 and Figure 10 were not mentioned inside the text.
    1. The Top 10 mutations in PCA analysis are the mutations in 20A and I/A3i. It is reasonable to observed a clear association of the clusters with the clades. It is not clear, however, how these distribution correlate with lockdown, contact tracing and quarantine measures.

    Section 5 – Decision

    Requires revisions

  3. Peer review report

    Reviewer: Dr. Jyotsnamayee Sabat Institution: Regional VRDL, RMRC(ICMR). email: jyotsnasabat@yahoo.com


    Section 1 – Serious concerns

    • Do you have any serious concerns about the manuscript such as fraud, plagiarism, unethical or unsafe practices? No
    • Have authors’ provided the necessary ethics approval (from authors’ institution or an ethics committee)? not applicable

    Section 2 – Language quality

    • How would you rate the English language quality? Good quality

    Section 3 – validity and reproducibility

    • Does the work cite relevant and sufficient literature? Yes
    • Is the study design appropriate and are the methods used valid? Yes
    • Are the methods documented and analysis provided so that the study can be replicated? Yes
    • Is the source data that underlies the result available so that the study can be replicated? Yes
    • Is the statistical analysis and its interpretation appropriate? Yes
    • Is quality of the figures and tables satisfactory? Yes
    • Are the conclusions adequately supported by the results? Yes
    • Are there any objective errors or fundamental flaws that make the research invalid?

    No such application was observed.


    Section 4 – Suggestions

    • In your opinion how could the author improve the study?

    They have analysed it in-depth and presented nicely.

    • Do you have any other feedback or comments for the Author?

    I want to know how the representative sequences were selected for different states. Is it based on no of sequences submitted or positivity rate of a particular region.


    Section 5 – Decision

    Verified manuscript

  4. Peer review report

    Reviewer: Parvin Abraham Institution: MIMS Research Foundation, Calicut, Kerala, India. email: parvinabraham@gmail.com


    Section 1 – Serious concerns

    • Do you have any serious concerns about the manuscript such as fraud, plagiarism, unethical or unsafe practices? No
    • Have authors’ provided the necessary ethics approval (from authors’ institution or an ethics committee)? not applicable

    Section 2 – Language quality

    • How would you rate the English language quality? High quality

    Section 3 – validity and reproducibility

    • Does the work cite relevant and sufficient literature? Yes
    • Is the study design appropriate and are the methods used valid? Yes
    • Are the methods documented and analysis provided so that the study can be replicated? Yes
    • Is the source data that underlies the result available so that the study can be replicated? Yes
    • Is the statistical analysis and its interpretation appropriate? Yes
    • Is quality of the figures and tables satisfactory? Yes
    • Are the conclusions adequately supported by the results? Yes
    • Are there any objective errors or fundamental flaws that make the research invalid? Please describe these thoroughly. No

    Section 4 – Suggestions

    • In your opinion how could the author improve the study?

    The dataset is only from 27th Jan – 27th May 2020. Maybe they can include more Numbers.

    • Do you have any other feedback or comments for the Author? No

    Section 5 – Decision

    Verified manuscript

  5. SciScore for 10.1101/2021.09.22.461342: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Ethicsnot detected.
    Sex as a biological variablenot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.