Graph Convolutional Network-Guided Inverse Link Prediction for Sparsification of Metal-Organic Framework Graphs in Large-Scale Cheminformatics
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The increasing availability of large-scale Metal-Organic Framework (MOF) datasets presents significant challenges for efficient analysis and knowledge extraction. In our previous work, we introduced a novel approach to represent MOF data as similarity graphs, enabling network-based analysis of MOFs. However, these graphs often become extremely dense, limiting interpretability and scalability. In this work, we propose an Inverse Link Prediction (ILP)-based sparsification framework that leverages Graph Convolutional Networks (GCNs) to selectively prune redundant edges while preserving the inherent community structures and domain knowledge encoded within the MOF graph. Beyond graph simplification, we rigorously validate the utility of the sparsified graphs by evaluating both graph-based and non-graph-based machine learning models. Specifically, we show that Graph Neural Networks (GCN, GraphRAGE) still achieve competitive performance when trained on the sparsified graph, demonstrating its capability to maintain informative structures. Additionally, we investigate non-graph-based classifiers (Gradient Boosting Trees, Logistic Regression, Naïve Bayes, and Deep Neural Networks) trained on feature vectors extracted from the sparsified graphs. Our results confirm that the sparsification preserves sufficient information to support both types of machine learning approaches, making the proposed ILP method a robust tool for scalable and knowledge-preserving MOF analysis in big data cheminformatics.