Prediction of the receptorome for the human-infecting virome

Abstract

The virus receptors are key for the viral infection of host cells. Identification of the virus receptors is still challenging at present. Our previous study has shown that human virus receptor proteins have some unique features including high N-glycosylation level, high number of interaction partners and high expression level. Here, a random-forest model was built to identify human virus receptorome from human cell membrane proteins with an accepted accuracy based on the combination of the unique features of human virus receptors and protein sequences. A total of 1380 human cell membrane proteins were predicted to constitute the receptorome of the human-infecting virome. In addition, the combination of the random-forest model with protein-protein interactions between human and viruses predicted in previous studies enabled further prediction of the receptors for 693 human-infecting viruses, such as the Enterovirus, Norovirus and West Nile virus. As far as we know, this study is the first attempt to predict the receptorome for the human-infecting virome and would greatly facilitate the identification of the receptors for viruses.

SciScore for 10.1101/2020.02.27.967885: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Human cell membrane proteins and human membrane proteins were obtained from the UniProtKB/Swiss-Prot database on February 21, 2020.	UniProtKB/Swiss-Prot suggested: None
For those proteins without the annotation of N-glycosylation sites in the UniprotKB/Swiss-Prot database, their N-glycosylation sites were predicted with NetNGlyc 1.0 (available at http://www.cbs.dtu.dk/services/NetNGlyc/) (Gupta et al., 2004).	NetNGlyc suggested: (NetNGlyc, RRID:SCR_001570)
To calculate the node degree of the human proteins in the human PPI network, firstly, the human PPIs with the combined scores greater than …

SciScore for 10.1101/2020.02.27.967885: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Human cell membrane proteins and human membrane proteins were obtained from the UniProtKB/Swiss-Prot database on February 21, 2020.	UniProtKB/Swiss-Prot suggested: None
For those proteins without the annotation of N-glycosylation sites in the UniprotKB/Swiss-Prot database, their N-glycosylation sites were predicted with NetNGlyc 1.0 (available at http://www.cbs.dtu.dk/services/NetNGlyc/) (Gupta et al., 2004).	NetNGlyc suggested: (NetNGlyc, RRID:SCR_001570)
To calculate the node degree of the human proteins in the human PPI network, firstly, the human PPIs with the combined scores greater than 400 were extracted from the STRING database (version 10.5) (Szklarczyk et al., 2015) and were used to form the human PPI network.	STRING suggested: (STRING, RRID:SCR_005223)
Since there were strong correlations between the gene expression level in different tissues, the principal component analysis (PCA) method was used to reduce the correlations with the function of PCA in the package scikit-learn (version 0.21.3) (Pedregosa et al., 2011) in Python (version 3.6.7).	scikit-learn suggested: (scikit-learn, RRID:SCR_002577)
Besides, the sequence redundancy in both human virus receptor proteins and human membrane proteins was removed using CD-HIT (version 4.8.1) (Fu et al., 2012) at 70% identity level.	CD-HIT suggested: (CD-HIT, RRID:SCR_007105)
Five times of five-fold cross-validations were conducted to evaluate the predictive performances of the RF model with the function of StratifiedKFold in the package scikit-learn in Python.	Python suggested: (IPython, RRID:SCR_001658)
The RBPs of human-infecting viruses were compiled from three sources: the ViralZone database (Masson et al., 2012), the UniprotKB database in which viral proteins were annotated with GO terms “viral entry into host cell” or “virion attachment the host cell”, and the literatures related to viral RBPs.	ViralZone suggested: (ViralZone, RRID:SCR_006563) UniprotKB suggested: (UniProtKB, RRID:SCR_004426)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

There are some limitations to this study. Firstly, the number of human virus receptor proteins was much smaller than that of human membrane proteins in the modeling, which may hinder accurate modeling. Thus, the under-sampling method was used to deal with the imbalance problem. Secondly, the performance of the RF model was modest in discriminating human virus receptor proteins from human membrane proteins. More efforts are still needed to improve the model. Thirdly, although the RF model can be used to predict the receptorome of human-infecting virome, it is not feasible to use the model to identify the receptors for a specific human-infecting virus. The combination of the RF model with the model of PPI predictions such as Lasso’s work can help identify virus-receptor interactions. In conclusion, this study for the first time built a computational model for predicting the receptorome of the human-infecting virome. The results can facilitate the identification of human virus receptors in either computational or experimental studies.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Prediction of the receptorome for the human-infecting virome

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Meta-analysis of functional genomics studies reveals conserved cellular pathways required by viruses of pandemic concern

Human Cytomegalovirus Strain Specific Differences in Protein Expression of Type I IFN Pathway Proteins Do Not Impact Virus Replication.

Structure-based computational screening and molecular dynamics reveal potential inhibitors of Norovirus VP1 and RdRp Proteins: an in-silico study

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Meta-analysis of functional genomics studies reveals conserved cellular pathways required by viruses of pandemic concern

Human Cytomegalovirus Strain Specific Differences in Protein Expression of Type I IFN Pathway Proteins Do Not Impact Virus Replication.

Structure-based computational screening and molecular dynamics reveal potential inhibitors of Norovirus VP1 and RdRp Proteins: an in-silico study