Target X built on Groovy: An unbiased and robust ligandability prediction model built and evaluated on non-redundant, well-curated dataset

Neha Vithani
David Wych
She Zhang
Phu Tang
Alex Demidov
A. Geoffrey Skillman
David N. LeBard

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Reliable assessment of the likelihood of a site on protein surface binding a drug-like ligand is important for the identification of ligandable pockets, ranking those pockets, and prioritizing protein targets for drug discovery campaigns. The use of global protein similarity metrics to define pocket redundancy have resulted in ligandability prediction models with strong biases toward ligand-binding sites that are overrepresented in the training set, and data-leakage between the training and validation data naturally occurs. Here, we present a new framework using pocket similarity measurements for building truly non-redundant training and validation datasets of pockets and use it to construct a protein pocket dataset, Groovy. We also introduce a robust ligandability prediction model, Target X, trained and validated on Groovy that is capable of differentiating non-ligandable vs ligandable surfaces with 91% accuracy and 97% precision. We further apply Target X to detect pockets on protein surfaces and rank them by ligandability with high success rates as measured by three different evaluation metrics. We believe our data curation framework should be considered a ‘best practice’ for any prediction and analysis model that relies on a dataset of ligand-binding pockets.

Version published to 10.21203/rs.3.rs-8833408/v1 on Research Square
Feb 12, 2026

Does DrugCLIP Find the Right Pocket? A Systematic Evaluation of Binding-Site Identification Across 42 Drug Targets

This article has 4 authors:
1. Bocheng Xie
2. Xiaokang Guo
3. Pengwei Xiao
4. Chao Yang
This article has no evaluationsLatest version Apr 2, 2026
Enhanced sampling and ligandability assessment to expand the repertoire of potentially druggable cryptic pockets

This article has 8 authors:
1. Neha Vithani
2. She Zhang
3. Judith Günther
4. Hans Purkey
5. J. David Lawson
6. Anthony Nicholls
7. A. Geoffrey Skillman
8. David N. LeBard
This article has no evaluationsLatest version Feb 16, 2026
Benchmarking Molecular Representations for Aqueous Solubility Prediction: The Impact of Inductive Bias and Scaffold Splitting in Low-Data Regimes

This article has 1 author:
1. Mudassir Ur Rahman
This article has no evaluationsLatest version Mar 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Does DrugCLIP Find the Right Pocket? A Systematic Evaluation of Binding-Site Identification Across 42 Drug Targets

Enhanced sampling and ligandability assessment to expand the repertoire of potentially druggable cryptic pockets

Benchmarking Molecular Representations for Aqueous Solubility Prediction: The Impact of Inductive Bias and Scaffold Splitting in Low-Data Regimes