Learning universal knowledge graph embedding for predicting biomedical pairwise interactions

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Predicting biomedical interactions is crucial for understanding various biological processes and drug discovery. Graph neural networks (GNNs) are promising in identifying novel interactions when extensive labeled data are available. However, labeling biomedical interactions is often time-consuming and labor-intensive, resulting in low-data scenarios. Furthermore, distribution shifts between training and test data in real-world applications pose a challenge to the generalizability of GNN models. Recent studies suggest that pre-training GNN models with self-supervised learning on unlabeled data can enhance their performance in predicting biomedical interactions. Here, we propose LukePi, a novel self-supervised pre-training frame-work that pre-trains GNN models on biomedical knowledge graphs (BKGs). LukePi is trained with two self-supervised tasks: topology-based node degree classification and semantics-based edge recovery. The former is to predict the degree of a node from its topological context and the latter is to infer both type and existence of a candidate edge by learning semantic information in the BKG. By integrating the two complementary tasks, LukePi effectively captures the rich information from the BKG, thereby enhancing the quality of node representations. We evaluate the performance of LukePi on two critical link prediction tasks: predicting synthetic lethality and drug-target interactions, using four benchmark datasets. In both distribution-shift and low-data scenarios, LukePi significantly outperforms 15 baseline models, demonstrating the power of the graph pre-training strategy when labeled data are sparse.

Article activity feed