Rag2Mol: structure-based drug design based on retrieval augmented generation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Artificial intelligence (AI) has brought tremendous progress to drug discovery, yet identifying hit and lead compounds with optimal physicochemical and pharmacological properties remains a significant challenge. Structure-based drug design (SBDD) has emerged as a promising paradigm, but the inherent data biases and ignorance of synthetic accessibility render SBDD models disconnected from practical drug discovery. In this work, we explore two methodologies, Rag2Mol-G and Rag2Mol-R, both based on retrieval-augmented generation to design small molecules to fit a 3D pocket. These two methods involve searching for similar small molecules that are purchasable in the database based on the generated ones or creating new molecules from those in the database that can fit into a 3D pocket. Experimental results demonstrate that Rag2Mol methods consistently produce drug candidates with superior binding affinities and drug-likeness. We find that Rag2Mol-R provides a broader coverage of the chemical landscapes and more precise targeting capability than advanced virtual screening models. Notably, both workflows identified promising inhibitors for the challenging target protein tyrosine phosphatases PTPN2, which was used to be considered undruggable and still lacks inhibitors that have completed full clinical trials. Our highly extensible framework can integrate diverse SBDD methods, marking a significant advancement in AI-driven SBDD.