Assessing the Generalizability of Machine Learning and Physics-Based Methods with DNA-Encoded Libraries
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Predicting protein-ligand binding is a central challenge in computational drug discovery, and while machine learning (ML) and co-folding methods have advanced rapidly, their ability to generalize beyond training or parameterization regimes remains insufficiently understood. DNA-encoded libraries (DELs) enable ultra-large screening of billions of molecules simultaneously, providing a useful testbed for evaluating these approaches at scale. A recent NeurIPS competition revealed that even top performing ML models trained on DEL data failed at generalizing to out-of-distribution (OOD) chemical space. We investigated whether integrating structural modeling could bridge this generalization gap. We systematically assessed state-of-the-art ML, docking, and co-folding methods including Schrodinger Glide, Rosetta GALigandDock, and Boltz-2 with three biologically diverse protein targets screened against libraries containing multiple DEL synthesis formats. While ML excels in-distribution, OOD hit discrimination is dependent on both the target and ligand context, with no single method consistently dominating. These findings demonstrate that benchmark performance alone is insufficient to predict OOD performance, highlighting the need for system-dependent evaluation of binding prediction methods. We provide an open-source package for assessing protein-ligand prediction methods and analyzing high-throughput screening data: DEL-iver.