Prediction of the receptorome for the human-infecting virome

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

The virus receptors are key for the viral infection of host cells. Identification of the virus receptors is still challenging at present. Our previous study has shown that human virus receptor proteins have some unique features including high N-glycosylation level, high number of interaction partners and high expression level. Here, a random-forest model was built to identify human virus receptorome from human cell membrane proteins with an accepted accuracy based on the combination of the unique features of human virus receptors and protein sequences. A total of 1380 human cell membrane proteins were predicted to constitute the receptorome of the human-infecting virome. In addition, the combination of the random-forest model with protein-protein interactions between human and viruses predicted in previous studies enabled further prediction of the receptors for 693 human-infecting viruses, such as the Enterovirus, Norovirus and West Nile virus. As far as we know, this study is the first attempt to predict the receptorome for the human-infecting virome and would greatly facilitate the identification of the receptors for viruses.

Article activity feed

  1. SciScore for 10.1101/2020.02.27.967885: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Human cell membrane proteins and human membrane proteins were obtained from the UniProtKB/Swiss-Prot database on February 21, 2020.
    UniProtKB/Swiss-Prot
    suggested: None
    For those proteins without the annotation of N-glycosylation sites in the UniprotKB/Swiss-Prot database, their N-glycosylation sites were predicted with NetNGlyc 1.0 (available at http://www.cbs.dtu.dk/services/NetNGlyc/) (Gupta et al., 2004).
    NetNGlyc
    suggested: (NetNGlyc, RRID:SCR_001570)
    To calculate the node degree of the human proteins in the human PPI network, firstly, the human PPIs with the combined scores greater than 400 were extracted from the STRING database (version 10.5) (Szklarczyk et al., 2015) and were used to form the human PPI network.
    STRING
    suggested: (STRING, RRID:SCR_005223)
    Since there were strong correlations between the gene expression level in different tissues, the principal component analysis (PCA) method was used to reduce the correlations with the function of PCA in the package scikit-learn (version 0.21.3) (Pedregosa et al., 2011) in Python (version 3.6.7).
    scikit-learn
    suggested: (scikit-learn, RRID:SCR_002577)
    Besides, the sequence redundancy in both human virus receptor proteins and human membrane proteins was removed using CD-HIT (version 4.8.1) (Fu et al., 2012) at 70% identity level.
    CD-HIT
    suggested: (CD-HIT, RRID:SCR_007105)
    Five times of five-fold cross-validations were conducted to evaluate the predictive performances of the RF model with the function of StratifiedKFold in the package scikit-learn in Python.
    Python
    suggested: (IPython, RRID:SCR_001658)
    The RBPs of human-infecting viruses were compiled from three sources: the ViralZone database (Masson et al., 2012), the UniprotKB database in which viral proteins were annotated with GO terms “viral entry into host cell” or “virion attachment the host cell”, and the literatures related to viral RBPs.
    ViralZone
    suggested: (ViralZone, RRID:SCR_006563)
    UniprotKB
    suggested: (UniProtKB, RRID:SCR_004426)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    There are some limitations to this study. Firstly, the number of human virus receptor proteins was much smaller than that of human membrane proteins in the modeling, which may hinder accurate modeling. Thus, the under-sampling method was used to deal with the imbalance problem. Secondly, the performance of the RF model was modest in discriminating human virus receptor proteins from human membrane proteins. More efforts are still needed to improve the model. Thirdly, although the RF model can be used to predict the receptorome of human-infecting virome, it is not feasible to use the model to identify the receptors for a specific human-infecting virus. The combination of the RF model with the model of PPI predictions such as Lasso’s work can help identify virus-receptor interactions. In conclusion, this study for the first time built a computational model for predicting the receptorome of the human-infecting virome. The results can facilitate the identification of human virus receptors in either computational or experimental studies.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.