Transfer learning via multi-scale convolutional neural layers for human–virus protein–protein interaction prediction

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Motivation

To complement experimental efforts, machine learning-based computational methods are playing an increasingly important role to predict human–virus protein–protein interactions (PPIs). Furthermore, transfer learning can effectively apply prior knowledge obtained from a large source dataset/task to a small target dataset/task, improving prediction performance.

Results

To predict interactions between human and viral proteins, we combine evolutionary sequence profile features with a Siamese convolutional neural network (CNN) architecture and a multi-layer perceptron. Our architecture outperforms various feature encodings-based machine learning and state-of-the-art prediction methods. As our main contribution, we introduce two transfer learning methods (i.e. ‘frozen’ type and ‘fine-tuning’ type) that reliably predict interactions in a target human–virus domain based on training in a source human–virus domain, by retraining CNN layers. Finally, we utilize the ‘frozen’ type transfer learning approach to predict human–SARS-CoV-2 PPIs, indicating that our predictions are topologically and functionally similar to experimentally known interactions.

Availability and implementation

The source codes and datasets are available at https://github.com/XiaodiYangCAU/TransPPI/.

Supplementary information

Supplementary data are available at Bioinformatics online.

Article activity feed

  1. SciScore for 10.1101/2021.02.16.431420: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Enrichment analysis: Module identification and functional analysis of the modules: In order to build the integrated interaction network for topological analysis, we first collected known protein interactions between the human proteins predicted to interact with SARS-CoV-2 from the HIPPIE database (Alanis-Lobato et al., 2017).
    HIPPIE
    suggested: (HIPPIE, RRID:SCR_014651)
    Visualizations of the modules (i.e., subnetworks) were carried out with Cytoscape (Shannon et al., 2003).
    Cytoscape
    suggested: (Cytoscape, RRID:SCR_003032)
    Enrichment analysis for each cluster was performed by using hypergeometric tests, where corresponding P-values were Bonferroni corrected, and only the five most enriched GO BP terms and KEGG pathways were considered (Adjusted P-value ≤ 0.05) in Figure 6.
    KEGG
    suggested: (KEGG, RRID:SCR_012773)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.