Characterization of the substitution hotspots in SARS-CoV-2 genome using BioAider and detection of a SR-rich region in N protein providing further evidence of its animal origin

This article has been Reviewed by the following groups

Read the full article

Abstract

The novel human coronavirus (SARS-CoV-2) causes the coronavirus disease 2019 (COVID-19) pandemic worldwide. The increasing sequencing data have shown abundant single nucleotide variations in SARS-CoV-2 genome. However, it is difficult to quickly analyze genomic variation and screen key mutations of SARS-CoV-2. In this study, we developed a visual program, named BioAider, for quick and convenient sequence annotation and mutation analysis on multiple genome-sequencing data. Using BioAider, we conducted a comprehensive genome variation analysis on 3,240 sequences of SARS-CoV-2 genome. Herein, we detected 14 substitution hotspots within SARS-CoV-2 genome, including 10 non-synonymous and 4 synonymous ones. Among these hotspots, NSP13-Y541C was predicted to be a crucial substitution which might affect the unwinding activity of NSP13, a key protein for viral replication. Besides, we also found 3 groups of potentially linked substitution hotspots which were worth further study. In particular, we discovered a SR-rich region (aa 184-204) on the N protein of SARS-CoV-2 distinct from SARS-CoV, indicating more complex replication mechanism and unique N-M interaction of SARS-CoV-2. Interestingly, the quantity of SRXX repeat fragments in the SR-rich region well reflected the evolutionary relationship among SARS-CoV-2 and SARS-CoV-2 related animal coronaviruses, providing further evidence of its animal origin. Overall, we developed an efficient tool for rapid identification of mutations, identified substitution hotspots in SARS-CoV-2 genomes, and detected a distinctive polymorphism SR-rich region in N protein. This tool and the detected hotspots could facilitate the viral genomic study and may contribute for screening antiviral target sites.

Article activity feed

  1. SciScore for 10.1101/2020.06.04.135293: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Main functions and working principle of BioAider: BioAider V1.0 was developed based on Python 3.7 and R 3.5.2, and used PyQt5 for interface packaging.
    BioAider
    suggested: None
    Python
    suggested: (IPython, RRID:SCR_001658)
    Multiple sequence alignment of genomic sequences of SARS-CoV-2 were accomplished using MAFFT v7.407 [40].
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Then we used MUSCLE program in MEGA v7.0.14 to align these coding genes based on codons method [41].
    MUSCLE
    suggested: (MUSCLE, RRID:SCR_011812)
    MEGA
    suggested: (Mega BLAST, RRID:SCR_011920)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on pages 29 and 30. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.