Localised Graph Neural Networks for Aqueous Solubility Prediction: A New Paradigm in QSPR Modelling
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Predicting aqueous solubility remains a key challenge in drug discovery due to its importance in absorption, distribution, metabolism, and elimination (ADME) properties. Recent advances in machine learning, particularly graph neural networks (GNNs), have set new benchmarks in quantitative structure--property relationship (QSPR) modelling. Existing methods, however, focus almost exclusively on global models that attempt to generalise across large chemical spaces. In this paper, we introduce a novel paradigm: \emph{localised} GNN models trained on structurally similar molecules. We demonstrate that this approach outperforms state-of-the-art benchmarks on the AqSolDB dataset, achieving a root mean squared error (RMSE) of 0.903 compared to 1.459 for SolTranNet. We further provide the first large-scale quantitative study of the relationship between Tanimoto similarity and solubility difference, supporting the intuition that localised models can capture fine-grained structure--property dependencies relevant to iterative drug design. Our results establish localisation as a nontrivial and promising complement to global QSPR approaches.