Activity Cliff-Informed Contrastive Learning for Molecular Property Prediction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurately predicting molecular activity is hindered by activity cliffs, which are sharp potency changes between highly similar compounds that distort the smoothness assumed by modern QSAR and graph neural networks (GNNs). Here we introduce activity cliff awareness (AC-awareness), an inductive bias that reshapes GNN latent spaces to account for these discontinuities. Implemented through an Activity Cliff Awareness (ACA) loss combining regression with soft-margin triplet contrastive learning, the method dynamically mines high-value cliff triplets during training and corrects inconsistent neighbourhoods in latent space. This yields progressively fewer cliff violations, more coherent activity gradients, and substantially reduced label incoherence across diverse chemical spaces. Evaluated on 52 datasets spanning low-sample narrow-scaffold series, large mixed-scaffold benchmarks, matched-pair cliff classification, and ADMET delta property prediction, AC-awareness consistently improves predictive accuracy and outperforms strong ECFP- and GNN-based baselines. The approach generalizes across multiple GNN backbones and remains effective under fixed hyperparameters, demonstrating that cliff-aware contrastive geometry provides a robust, architecture-independent mechanism for mitigating structure–activity discontinuities. These results establish AC-awareness as a principled strategy for enhancing molecular property prediction by aligning latent representations with the nonadditive behaviour underlying activity cliffs.