Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Article activity feed
-
-
SciScore for 10.1101/2021.08.15.456422: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources The coronavirus cleavage sites: At least one polyprotein (ORF1ab) sequence with the known cleavage sites of the 3CL protease were obtained for 14 coronavirus species in three genera including Alphacoronavirus, Betacoronavirus and Gammacoronavirus from the NCBI RefSeq and protein databases on April 15th,2021 (Table S1). RefSeqsuggested: (RefSeq, RRID:SCR_003496)For each coronavirus species with the known cleavage sites on at least one polyprotein, the polyprotein sequences were obtained from the NCBI protein database and were aligned with MAFFT (version 7.427) (Katoh and Standley, 2013). NCBISciScore for 10.1101/2021.08.15.456422: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources The coronavirus cleavage sites: At least one polyprotein (ORF1ab) sequence with the known cleavage sites of the 3CL protease were obtained for 14 coronavirus species in three genera including Alphacoronavirus, Betacoronavirus and Gammacoronavirus from the NCBI RefSeq and protein databases on April 15th,2021 (Table S1). RefSeqsuggested: (RefSeq, RRID:SCR_003496)For each coronavirus species with the known cleavage sites on at least one polyprotein, the polyprotein sequences were obtained from the NCBI protein database and were aligned with MAFFT (version 7.427) (Katoh and Standley, 2013). NCBIsuggested: (NCBI, RRID:SCR_006472)MAFFTsuggested: (MAFFT, RRID:SCR_011811)The AA indexes: A total of 566 AA indexes were obtained from the AAindex database (version 9.2) on November 18th, 2020 (Kawashima et al., 2008). AAindexsuggested: NoneThe location of human proteins were inferred based on the subcellular location provided by the UniProt database. UniProtsuggested: (UniProtKB, RRID:SCR_004426)Logo of sequences centered the cleavage sites: The logo of sequences centered the cleavage sites was generated with WebLogo 3 using the default parameters on April 16th, 2021 (Crooks et al., 2004). WebLogosuggested: (WEBLOGO, RRID:SCR_010236)The Principal Component Analysis (PCA): The PCA of the AA indexes were achieved using the module of sklearn.decomposition in Python (version 3.7) (Pedregosa et al., 2011). Pythonsuggested: (IPython, RRID:SCR_001658)Functional enrichment analysis of human genes: The KEGG pathway and GO enrichment analysis was conducted with functions of enrichKEGG and enrichGO in the package clusterProfiler (version 3.18.1) in R (version 4.0.5) (Yu et al., 2012). KEGGsuggested: (KEGG, RRID:SCR_012773)clusterProfilersuggested: (clusterProfiler, RRID:SCR_016884)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:There were some limitations in the study. Firstly, although the dataset was much larger than that used in previous studies, the dataset was still limited in size. The cleavage sites of the Deltacoronavirus were not included in the analysis. Nevertheless, the computational model developed here still showed high accuracy in both validations and testing, suggesting their potential usage in predicting the cleavage sites of the coronavirus 3CL protease. Secondly, the P1 position of the coronavirus 3C-like protease cleavage site was supposed to be highly conserved with Q, although there were a few cleavage sites with other residues in the P1 position. More experimental efforts are needed to determine the AA specificity of the coronavirus 3CL protease cleavage sites. Thirdly, only 12 experimentally-validated cleavage sites were used to test the model. More experimental validations are needed to evaluate the performance of the model.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-