Target X built on Groovy: An unbiased and robust ligandability prediction model built and evaluated on non-redundant, well-curated dataset
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Reliable assessment of the likelihood of a site on protein surface binding a drug-like ligand is important for the identification of ligandable pockets, ranking those pockets, and prioritizing protein targets for drug discovery campaigns. The use of global protein similarity metrics to define pocket redundancy have resulted in ligandability prediction models with strong biases toward ligand-binding sites that are overrepresented in the training set, and data-leakage between the training and validation data naturally occurs. Here, we present a new framework using pocket similarity measurements for building truly non-redundant training and validation datasets of pockets and use it to construct a protein pocket dataset, Groovy. We also introduce a robust ligandability prediction model, Target X, trained and validated on Groovy that is capable of differentiating non-ligandable vs ligandable surfaces with 91% accuracy and 97% precision. We further apply Target X to detect pockets on protein surfaces and rank them by ligandability with high success rates as measured by three different evaluation metrics. We believe our data curation framework should be considered a ‘best practice’ for any prediction and analysis model that relies on a dataset of ligand-binding pockets.