PocketBagger: Generalizable pocket druggability prediction via positive–unlabeled learning

Phillip W Gingrich
Ansuman Biswas
Ioan L Mica
Kevin M Brammer
Zhigang Shu
David S Maxwell
Kaitlyn P Russell
Bissan Al-lazikani

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Reliable structure-based prediction of small-molecule druggability is hindered by a fundamental labeling problem. Experimentally confirmed liganded sites (positives) are observable, but credible “undruggable” pockets (negatives) are almost impossible to define. Standard supervised machine learning consequently relies on arbitrary definitions of ‘undruggable’, leading to bias and false negatives. Here we introduce PocketBagger, a positive–unlabeled (PU) learning framework for pocket druggability prediction trained exclusively on experimentally determined Protein Data Bank ¹ (PDB) structures. PocketBagger uses PU bagging to learn key features associated with reliable ‘druggable’ pockets and considers all remaining pockets in the structurally characterized proteome as unlabeled. We demonstrate the capability of PocketBagger through the training of a simple Random Forest classifier and demonstrate its power in recall (0.804), even when challenged with increasingly difficult generalizability assessments and entire protein-family hold outs. We benchmark and demonstrate the added value of PU learning by comparing PocketBagger to a leading deep-learning predictor. However, PocketBagger is intended to be used as a framework for any model architecture. Along with the code, the data generated by PocketBagger are deployed in canSAR.ai, providing scalable, generalizable pocket druggability predictions to the drug discovery community.

Version published to 10.64898/2026.05.15.725505 on bioRxiv
May 19, 2026

AE-PocketMiner Uses Attention to Simultaneously Predict Cryptic Pockets and Their Allosteric Coupling

This article has 5 authors:
1. Si Zhang
2. Prajna Mishra
3. Devin Kelly
4. Rachit Kumar
5. Gregory R. Bowman
This article has no evaluationsLatest version May 23, 2026
A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

This article has 1 author:
1. Sai T. Reddy
This article has no evaluationsLatest version Jun 8, 2026
OracleScreen-LILRB4: Machine Learning-Guided Discovery of Myeloid Immune Checkpoint Binders Validated in Patient-Derived Cells

This article has 2 authors:
1. Somaya A. Abdel-Rahman
2. Moustafa T. Gabr
This article has no evaluationsLatest version Jun 21, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

AE-PocketMiner Uses Attention to Simultaneously Predict Cryptic Pockets and Their Allosteric Coupling

A Drug–Target Specificity Foundation Model for Off-target Prediction, Repurposing, and Generative Design

OracleScreen-LILRB4: Machine Learning-Guided Discovery of Myeloid Immune Checkpoint Binders Validated in Patient-Derived Cells