Generalized Molecular Latent Representation via Graph Latent Diffusion Autoencoder

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In recent years, deep neural networks (DNNs) have been applied for constructing molecular latent representations for drug discovery. The quality of these representations, obtained using DNN encoders, affects the generalization performance of the model, i.e., its ability to predict molecular properties for previously unseen compounds. Given the vast space of potential organic compounds and limited availability of labeled data with specific molecular properties, enhancing the generalization performance of predictive models is key for accelerating drug discovery. This requires the construction of effective molecular latent representations. Considering this aspect, this paper introduces the graph latent diffusion autoencoder (Graph LDA), a deep molecular generative model that combines a graph-transformer-based variational autoencoder and latent-diffusion-based latent prior model, designed for constructing generalized molecular representation through unsupervised learning. To assess the generalization performance of molecular property predictions based on the constructed molecular representations, the results for Graph LDA were compared with those of existing models using the widely applicable information criterion (WAIC) and widely applicable Bayesian information criterion (WBIC). The results indicated that Graph LDA outperformed the existing methods. Furthermore, we empirically demonstrated that the superior generalization performance of Graph LDA is attributable to the smoothness and multimodality of its learned molecular latent representation. The proposed robust framework for molecular property prediction holds significant potential for accelerating drug discovery and material advancements. Scientific contribution : This work introduces Graph LDA, a novel deep molecular generative model that combines a graph-transformer-based variational autoencoder and latent-diffusion-based latent prior model. The proposed model can extract smooth molecular latent representations with multimodal distributions, resulting in high generalization performance for molecular property prediction. Results of WAIC and WBIC analyses demonstrate that Graph LDA significantly outperforms existing representative unsupervised representation learning models.

Article activity feed