Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

No abstract available

Article activity feed

  1. SciScore for 10.1101/2021.08.15.456422: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The coronavirus cleavage sites: At least one polyprotein (ORF1ab) sequence with the known cleavage sites of the 3CL protease were obtained for 14 coronavirus species in three genera including Alphacoronavirus, Betacoronavirus and Gammacoronavirus from the NCBI RefSeq and protein databases on April 15th,2021 (Table S1).
    RefSeq
    suggested: (RefSeq, RRID:SCR_003496)
    For each coronavirus species with the known cleavage sites on at least one polyprotein, the polyprotein sequences were obtained from the NCBI protein database and were aligned with MAFFT (version 7.427) (Katoh and Standley, 2013).
    NCBI
    suggested: (NCBI, RRID:SCR_006472)
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    The AA indexes: A total of 566 AA indexes were obtained from the AAindex database (version 9.2) on November 18th, 2020 (Kawashima et al., 2008).
    AAindex
    suggested: None
    The location of human proteins were inferred based on the subcellular location provided by the UniProt database.
    UniProt
    suggested: (UniProtKB, RRID:SCR_004426)
    Logo of sequences centered the cleavage sites: The logo of sequences centered the cleavage sites was generated with WebLogo 3 using the default parameters on April 16th, 2021 (Crooks et al., 2004).
    WebLogo
    suggested: (WEBLOGO, RRID:SCR_010236)
    The Principal Component Analysis (PCA): The PCA of the AA indexes were achieved using the module of sklearn.decomposition in Python (version 3.7) (Pedregosa et al., 2011).
    Python
    suggested: (IPython, RRID:SCR_001658)
    Functional enrichment analysis of human genes: The KEGG pathway and GO enrichment analysis was conducted with functions of enrichKEGG and enrichGO in the package clusterProfiler (version 3.18.1) in R (version 4.0.5) (Yu et al., 2012).
    KEGG
    suggested: (KEGG, RRID:SCR_012773)
    clusterProfiler
    suggested: (clusterProfiler, RRID:SCR_016884)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    There were some limitations in the study. Firstly, although the dataset was much larger than that used in previous studies, the dataset was still limited in size. The cleavage sites of the Deltacoronavirus were not included in the analysis. Nevertheless, the computational model developed here still showed high accuracy in both validations and testing, suggesting their potential usage in predicting the cleavage sites of the coronavirus 3CL protease. Secondly, the P1 position of the coronavirus 3C-like protease cleavage site was supposed to be highly conserved with Q, although there were a few cleavage sites with other residues in the P1 position. More experimental efforts are needed to determine the AA specificity of the coronavirus 3CL protease cleavage sites. Thirdly, only 12 experimentally-validated cleavage sites were used to test the model. More experimental validations are needed to evaluate the performance of the model.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.