Expanding Chemical Space: Developing a Compound Generative Pre-trained Transformer for De Novo Drug Design

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The drug development process is time-consuming and costly. With the success of attention mechanism models across various domains, including drug development, their adoption has significantly increased. pre-trained a model on large datasets enhances its ability to understand the vast chemical space, enabling effective drug design. However, despite numerous generative models for drug discovery, challenges remain: (i) Most current models utilize less than 2 million compounds, which are insufficient to cover the vast chemical space and often results in limited interpretability. (ii) Existing pre-trained models for compound generation struggle to represent the full diversity of potential molecular structures due to restricted training data. Here, we developed a generic compound generator leveraging extensive chemical spaces. Training data derived from 200 million compounds in the ZINC20 database enabled the model to capture and represent the SMILES syntax and the key features of compounds. Using the attention mechanisms of the generative model, we achieved an interpretable understanding of chemical structures, ensuring robust performance in diverse chemical spaces. Our approach demonstrated 99% novelty, surpassing state-of-the-art methods to generating chemically valid and unique compounds.

Article activity feed