Benchmarking Real-World Applicability of Molecular Generative Models from De novo Design to Lead Optimization with MolGenBench
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Structure-based drug design (SBDD) has been profoundly reshaped by the advent of deep generative models, yet their practical impact on drug discovery remains limited. A central issue is the absence of a rigorous, application-oriented benchmark that mirrors the multi-stage, target-aware workflows of real-world pharmaceutical development. Inspired by recent advances in benchmarking for computer vision and large language models, where systematic evaluation has catalysed rapid progress, we introduce MolGenBench, a comprehensive benchmark designed to close the gap between molecular generation algorithms and tangible drug discovery outcomes. MolGenBench integrates a structurally diverse, large-scale dataset spanning 120 protein targets, 5,433 chemical series comprising 220,005 experimentally confirmed active molecules. Beyond conventional de novo generation, it incorporates a dedicated hit-to-lead (H2L) optimization scenario, which represents a critical phase in hit optimization that is seldom evaluated in existing benchmarks. We further introduce novel, pharmaceutically grounded metrics that assess a model’s ability to both rediscover target-specific actives or progressively optimize compounds for potency. Through extensive evaluation, MolGenBench uncovers significant gaps between current generative models and the demands of real-world drug development, establishing a foundational resource for building generative models with enhanced practical impact and accelerated translational potential.