Identification of Insulin Resistance-Related Genes Using Biomedical Knowledge Graphs Topology and Embeddings

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Knowledge graph (KG) feature engineering approaches, such as calculation of topological features and generation of embeddings, can be applied onto biomedical KGs (biomedKGs) to gain a better understanding of disease biology and identify novel gene-disease associations. However, evaluation of such approaches to study not only disease associations, but complex patho-physiologies, such as insulin resistance (IR), is lacking. In this study we used OpenBioLink and Hetionet biomedKGs to predict IR-related genes using topological feature engineering, link prediction, Elkanoto and outlier detection algorithms. We also evaluated how model performance was affected by the size of the training set and by enriching the biomedKG with IR information. Furthermore, we assessed the biological relation of the predictions to IR-related processes using the DepMap and Multiscale Interactome datasets and bioinformatic pathway functional annotations. Results: We found that models using topological features from both standard and enriched OpenBioLink achieved the best predictive performance, followed closely by Elkanoto using RotatE embeddings from both enriched and standard biomedKGs. Additionally, we found that a larger training set had a better effect on performance than enriching the biomedKGs with IR information. Our biological characterization showed that embeddings can capture the varied IR-related functions and broadly group them into related to cell proliferation and to metabolism. Notably, the enriched functional pathways of the top predicted genes included Chagas disease, which has a debated relation to IR. Conclusions: We comprehensively evaluated methods for identifying genes related to the complex patho-phenotype of IR. Our findings showed that biomedKG embeddings can capture complex biological information related to IR, without strong dependence on the specific schema of the biomedKG. Comparing biologically contextualized results, we found that embedding-based models had better generalization capabilities than the topology-based model, but there was a wide range of performance across the embedding-based models. Therefore, choosing a research approach should balance the need for accurate predictions and the possibility of discovering novel biological insights.

Article activity feed