Molecular Contrastive Learning with Graph Attention Network (MoCL-GAT) for Enhanced Molecular Representation
Discuss this preprint
Start a discussionListed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Learning the representation of molecules is crucial for drug discovery but is often hindered by the scarcity of labeled experimental data, which limits the performance of supervised machine learning models. While self-supervised learning (SSL) offers a solution by leveraging vast unlabeled chemical databases, many existing methods focus on learning from either local structural information or global molecular properties, but not both simultaneously. We introduce MoCL-GAT, a novel contrastive and transfer learning-based SSL framework that addresses this gap by simultaneously learning from two complementary objectives. It combines a local contrastive task on molecular subgraphs to capture fine-grained chemical environments with a global predictive task to learn holistic molecular descriptors. This dual-objective approach, powered by a Graph Attention Network, is designed to create more robust, versatile, and transferable molecular representations. Pre-trained on 1.9 million compounds, MoCL-GAT was fine-tuned on diverse benchmarks. It achieved state-of-the-art performance on molecular property prediction tasks, with an AUROC of 0.928 on BBBP and 0.749 on SIDER, and top-ranking RMSEs of 0.570 for ESOL and 1.818 for FreeSolv. Critically, fine-tuned models consistently and significantly outperformed models trained from scratch, confirming the value of pre-training. These results validate that MoCL-GAT’s dual-objective approach learns highly effective and transferable representations, enabling more accurate and data-efficient predictions for key cheminformatics challenges.