SARS-CoV-2 PLpro whole human proteome cleavage prediction and enrichment/depletion analysis

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

A novel coronavirus (SARS-CoV-2) has caused a pandemic that has killed millions of people, worldwide vaccination and herd immunity are still far away, and few therapeutics are approved by regulatory agencies for widespread use. The coronavirus 3-chymotrypsin-like protease (3CLpro) is a commonly investigated target in COVID-19, however less work has been directed toward the equally important papain-like protease (PLpro). PLpro is less characterized due to its fewer and more diverse cleavages in coronavirus proteomes and the assumption that it mainly modulates host pathways with its deubiquitinating activity. Here, I extend my previous work on 3CLpro human cleavage prediction and enrichment/depletion analysis to PLpro.[1] Using three sets of neural networks trained on different taxonomic ranks of dataset with a maximum of 463 different putative PLpro cleavages, Matthews correlation coefficients of 0.900, 0.948, and 0.966 were achieved for Coronaviridae , Betacoronavirus , and Sarbecovirus , respectively. I predict that more than 1,000 human proteins may be cleaved by PLpro depending on diversity of the training dataset and that many of these proteins are distinct from those previously predicted to be cleaved by 3CLpro. PLpro cleavages are similarly nonrandomly distributed and result in many annotations shared with 3CLpro cleavages including ubiquitination, poly(A) tail and 5’ cap RNA binding proteins, helicases, and endogenous viral proteins. Combining PLpro with 3CLpro cleavage predictions, additional novel enrichment analysis was performed on known substrates of cleaved E3 ubiquitin ligases with results indicating that many pathways including viral RNA sensing are affected indirectly by E3 ligase cleavage independent of traditional PLpro deubiquitinating activity. As with 3CLpro, PLpro whole proteome cleavage prediction revealed many novel potential therapeutic targets against coronaviruses, although experimental verification is similarly required.

Article activity feed

  1. SciScore for 10.1101/2021.10.04.462902: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Ethicsnot detected.
    Sex as a biological variablenot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    [23] Searching for all combinations of “orf/pp” and “1/1a/1ab” within the family Coronaviridae returned 12,770 different, complete polyproteins with 463 different cleavages manually discovered using the Clustal Omega multiple sequence alignment server.
    Clustal Omega
    suggested: (Clustal Omega, RRID:SCR_001591)
    28] Enrichment Analysis: Protein annotation, classification, and enrichment analysis was performed using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) 6.8.[29, 30] Tissue (UP_TISSUE and UNIGENE_EST_QUARTILE), InterPro, direct Gene Ontology (GO includes cellular compartment (CC), biological process (BP), and molecular function (MF)), Reactome pathways, sequence features, and keywords annotations were all explored, and only annotations with Benjamini-Hochberg-corrected p-values less than 0.05 were considered statistically significant.
    DAVID
    suggested: (DAVID, RRID:SCR_001881)
    InterPro
    suggested: (InterPro, RRID:SCR_006695)
    Cleaved E3 ubiquitin ligases were matched to their respective substrates for additional enrichment analysis using the UbiNet 2.0 database.[31] All training data, prediction methods, and results can be found on GitHub (https://github.com/Luke8472NN/NetProtease).
    UbiNet
    suggested: None

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.