How many crystal structures do you need to trust your docking results?

Alexander Matthew Payne
Benjamin Kaminow
Hugo MacDermott-Opeskin
Iván Pulido
Jenke Scheen
Maria A Castellanos
Daren Fearon
John D. Chodera
Sukrit Singh

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Structure-based drug discovery technologies generally require the prediction of putative bound poses of protein:small molecule complexes to prioritize them for synthesis. The predicted structures are used for a variety of downstream tasks such as pose-scoring functions or as a starting point for binding free energy estimation. The accuracy of downstream models depends on how well predicted poses match experimentally-validated poses. Although the ideal input to these downstream tasks would be experimental structures, the time and cost required to collect new experimental structures for synthesized compounds makes obtaining this structure for every input intractable. Thus, leveraging available structural data is required to efficiently extrapolate new designs. Using data from the open science COVID Moonshot project—where nearly every compound synthesized was crystallographically screened—we assess several popular strategies for generating docked poses in a structure-enabled discovery program using both retrospective and prospective analyses. We explore the tradeoff between the cost of obtaining crystal structures and the utility for accurately predicting poses of newly designed molecules. We find that a simple strategy using molecular similarity to identify relevant structures for template-guided docking is successful in predicting poses for the SARS-CoV-2 main viral protease. Further efficiency analysis suggests template-based docking of a scaffold series is a robust strategy even when the quantity of available structural data is limited. The resulting open source pipeline and curated datasets should prove useful for automated modeling of bound poses for downstream scoring, machine learning, and free energy calculation tasks for structure-based drug discovery programs.

Version published to 10.1101/2025.09.19.677428 on bioRxiv
Sep 24, 2025

Rapid Assessment of Chemical Complementarity of Ligands for Protein Design

This article has 9 authors:
1. Derek Woolfson
2. Rokas Petrenas
3. Katarzyna Ożga
4. Joel Chubb
5. Andrey Romanyuk
6. Jennifer McManus
7. Graham Leggett
8. Nigel Scrutton
9. Tom Oliver
This article has no evaluationsLatest version Dec 10, 2025
Are Energy and Forces Really Enough? Using Structure to Evaluate the Accuracy and Transferability of Machine Learning Potentials of Biomolecules

This article has 3 authors:
1. Lejla S. Biberić
2. Nisarg Joshi
3. Jim Pfaendtner
This article has no evaluationsLatest version Jan 14, 2026
Integrating Computational Biology in Modern Drug Discovery: A Synergistic Approach of Structure-Based, Ligand-Based, and Network Pharmacology Strategies

This article has 4 authors:
1. Cromwel Tepap Zemnou
2. Gabriel Tchuente Kamsu
3. Ramelle Ngakam
4. Etienne Junior Tcheumeni
This article has no evaluationsLatest version Jan 29, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Rapid Assessment of Chemical Complementarity of Ligands for Protein Design

Are Energy and Forces Really Enough? Using Structure to Evaluate the Accuracy and Transferability of Machine Learning Potentials of Biomolecules

Integrating Computational Biology in Modern Drug Discovery: A Synergistic Approach of Structure-Based, Ligand-Based, and Network Pharmacology Strategies