Physics beats diffusion: Agentic AI-driven virtual screening benchmark on a GPCR target

Osman Gani

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Virtual screening (VS) campaigns require expert decisions at every stage, from active compound curation and decoy generation to receptor preparation, docking engine selection, and statistical evaluation. I show that an autonomous large language model (LLM) coding agent (Claude Code, Anthropic) can design and execute a complete VS benchmark pipeline without human coding intervention that requires only high-level scientific direction. The agent curated 1,000 FPR2 (a GPCR receptor) actives from ChEMBL (pChEMBL ≥ 5), generated ca. 10,000 property-matched decoys, prepared ligand libraries using two protocols (naive defaults and expert-guided), configured and ran docking with two fundamentally different engines; 1) Uni-Dock (GPU-accelerated physics-based) and 2) DiffDock (diffusion-based machine learning). Then it performed full statistical evaluation including ROC AUC, BEDROC, enrichment factors, DeLong tests, and paired bootstrap confidence intervals. Uni-Dock achieved ROC AUC = 0.70–0.73 with significant discrimination (permutation p < 0.0001), while DiffDock confidence scores yielded near-random performance (AUC = 0.54–0.56; Cliff’s d = negligible), consistent with its known underrepresentation of GPCR targets in training data. Expert-guided protocols improved Uni-Dock AUC by +0.020 (DeLong p = 0.003; paired bootstrap p = 0.002). Single-ligand redocking confirmed Vina reproduces the crystal pose (RMSD 0.22–0.39 Å) whereas both Uni-Dock batch mode (5.2–5.7 Å) and DiffDock (23–29 Å) failed. All code, data, and the agent’s skill file are openly available. Scientific contribution: This is the first demonstration of an LLM coding agent autonomously constructing a reproducible VS benchmark from scratch. The resulting benchmark provides the first head-to-head comparison of Uni-Dock and DiffDock on a GPCR target that reveals that physics-based docking (AUC = 0.70–0.73) substantially outperforms diffusion-based ML docking (AUC = 0.54–0.56, near-random) for this underrepresented target class.

Version published to 10.21203/rs.3.rs-9142847/v1 on Research Square
Mar 18, 2026

Deep Learning Foundation Models from Classical Molecular Descriptors

This article has 7 authors:
1. William Green
2. Jackson Burns
3. Akshat Shirish Zalte
4. Charlles Abreu
5. Jochen Sieg
6. Christian Feldmann
7. Miriam Mathea
This article has no evaluationsLatest version Mar 16, 2026
Integrating Machine Learning-Based Molecular Design with Experimental Validation for the Discovery of EGFR Inhibitors in Lung Cancer

This article has 7 authors:
1. Hailing Qie
2. Liyuan Wang
3. Ce Li
4. Chen Wu
5. Yong Wang
6. Kuo Xiao
7. Lili Diao
This article has no evaluationsLatest version Mar 20, 2026
BravizineDSE: A Multi-Input Deep Transformer for Receptor-Context-Aware GPCR Signaling Bias Prediction and Derivation of a Six-Feature Molecular Selectivity Rule

This article has 1 author:
1. Aaryan Senthilvanan
This article has no evaluationsLatest version Mar 13, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Deep Learning Foundation Models from Classical Molecular Descriptors

Integrating Machine Learning-Based Molecular Design with Experimental Validation for the Discovery of EGFR Inhibitors in Lung Cancer

BravizineDSE: A Multi-Input Deep Transformer for Receptor-Context-Aware GPCR Signaling Bias Prediction and Derivation of a Six-Feature Molecular Selectivity Rule