Discrimination vs. Generation: The Machine Learning Dichotomy for Dopaminergic Hit Discovery

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Virtual screening plays a pivotal role in early drug discovery, traditionally dominated by physics-based methods. While these approaches offer detailed insights, they are often hindered by high computational costs, limited sampling, and forcefield inaccuracies. Advances in Machine Learning (ML) and Deep Learning (DL) present resource-efficient alternatives, with approaches like predictive geometric ML (e.g., EQUIBIND) and generative geometric ML (e.g., diffusion models) showing promise in enhancing both efficiency and predictive capability. Here, we compare these two strategies, retrospectively and prospectively, for identifying novel agonists targeting the dopamine D2 receptor. To complement DIFFDOCK’s dual functionality in protein-ligand conformer generation and confidence estimation, we adopted a complementary atom-type-based confidence model for EQUIBIND. This pipeline, termed the discriminative model, integrates a featurization step and an XGBoost classifier to differentiate between active and inactive ligands. The top-ranked compounds from both models were evaluated using an ultrafast dopaminergic biosensor assay, dLight. Our results demonstrate that the generative model achieved a higher hit rate, notably leading to the discovery of Compound 1, a nanomolar dopamine D2 receptor agonist with a novel scaffold. The discovery of novel therapeutic agents is a cornerstone of drug development, and virtual screening has emerged as a vital tool in the early stages of this process. Traditional physics-based virtual screening methods, though robust, are often hindered by their substantial computational demands 1,2 . These approaches typically require extensive candidate sampling, followed by rigorous scoring and refinement steps, making them resource-intensive and time-consuming 2 . In response to these challenges, the field has increasingly turned to Machine Learning (ML) to streamline drug discovery 1,3 . ML offers the potential to predict molecular interactions with greater efficiency, thus reducing the computational burden associated with traditional methods 4,5 . Two promising ML strategies have gained prominence: predictive geometric deep learning and generative geometric deep learning 6,7 . Regression-based models, such as EQUIBIND 8 , circumvent exhaustive sampling by directly predicting the final pose of a ligand. On the other hand, generative models, such DIFFDOCK 9 , aim to generate near-native poses, offering a novel approach to hit discovery. Here, we compared both approaches in the context of a dopaminergic hit discovery task. The models employed consist of the following components. • A discriminative model, which consists of a geometric graph neural network docking model — EQUIBIND 8 layered with an atomic interaction-based scoring function 10 , and a classifier model that differentiates and active from an inactive dopamine receptor ligand, in combination we termed these layers the EFX pipeline. • A generative model which deploys a diffusion-based docking algorithm — DIFFDOCK 9 that generates and scores binding poses for dopamine receptor active ligands obtained from a ligand-based screening process (Fig. 1a). Despite their potential, these ML methods have yet to be fully validated in experimental settings, particularly in the context of real-world drug discovery. This study seeks to bridge that gap by exploring the effectiveness of these ML approaches in identifying novel agonists for the dopamine D2 receptor, a target of significant therapeutic interest. Dopamine receptors, members of the G-protein coupled receptor (GPCR) family, are crucial regulators of several neurological and physiological processes 11 . Predominantly found in the brain, dopamine receptors are integral to functions such as executive control 12 , reward-motivation impulses 13 , motor activity 14 , and prolactin secretion 15 , mediated through four major pathways: mesocortical 16 , mesolimbic 17 , nigrostriatal 18 , and tuberoinfundibular 19 . Dysregulation of these pathways is implicated in a range of disorders, including attention deficit hyperactivity disorder (ADHD) 16 , addiction 17 , depression 18 , schizophrenia 10,18 , Parkinson’s disease 18 , and hyperprolactinemia 19 . As such, the development of ligands that can selectively activate dopamine receptor subtypes, particularly the D1 and D2 receptors, holds significant therapeutic promise. Our study focuses on leveraging advanced virtual screening techniques, deploying both discriminative and generative ML models, to discover novel agonists for the D2 receptor. By evaluating these models retrospectively and prospectively, we aim to assess their practical utility and contribute to the ongoing development of more efficient drug discovery pipelines. Through this research, we hope to uncover new chemotypes that could lead to innovative treatments for dopamine dysregulation related disorders.

Article activity feed