RNA Knowledge-Graph analysis through homogeneous embedding methods
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
We recently introduced RNA-KG, an ontology-based knowledge graph that integrates biological data on RNAs from over 60 public databases. RNA-KG captures functional relationships and interactions between RNA molecules and other biomolecules, chemicals, and biomedical concepts such as diseases and phenotypes, all represented within graph-structured bio-ontologies. We present the first comprehensive computational analysis of RNA-KG, evaluating the potential of graph representation learning and machine learning models to predict node types and edges within the graph.
Results
We performed node classification experiments to predict up to 81 distinct node types, and performed both generic and specific edge prediction tasks. Generic edge prediction focused on identifying the presence of an edge irrespective of its type, while specific edge prediction targeted specific interactions between ncRNAs, e.g. miRNA-miRNA or siRNA-mRNA, or relationships between ncRNA and biomedical concepts, e.g. miRNA-disease or lncRNA-Gene Ontology term relationships. Using embedding methods for homogeneous graphs, such as LINE and node2vec, in combination with machine learning models like decision trees and random forests, we achieved balanced accuracy exceeding 90% for the 20 most common node types and over 80% for most specific edge prediction tasks. These results show that simple embedding methods for homogeneous graphs can successfully predict nodes and edges of the RNA-KG, paving the way to discover novel ncRNA interactions and laying the foundation for further exploration and utilization of this rich information source to enhance prediction accuracy and support further research into the “RNA world”.
Code Availability
Python code to reproduce the experiments is available at https://github.com/AnacletoLAB/RNA-KG_homogeneous_emb_analysis