Illuminating the Druggable Proteome with an AI Protein Profiling Platform
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Mapping reversible and covalent protein- ligand binding sites across the proteome would transform our understanding of protein function and accelerate therapeutic discovery. However, current approaches face significant challenges. While the chemical proteomic technology activity-based protein profiling (ABPP) is extremely powerful, current constraints include incomplete proteome coverage and data heterogeneity. Existing machine learning (ML) models are limited by their dependence on structural input, inadequate training data, and lack of rigorous methods for handling heterogeneous experimental labels. To address these challenges, we developed AiPP, an AI protein profiling platform that predicts ligand-binding residues directly from protein sequence, including covalent and reversible ligand-binding sites, and disordered molecular recognition motifs that fold upon binding. AiPP is powered by protein large language models (pLLMs) and two newly curated comprehensive databases, LigCys-ABPP (cysteine-directed covalent liganding events quantified by ABPP) and LigBind3D (reversible binding sites in co-crystal structures). LigCysABPP comprises >700,000 cysteine-site records from 15 ABPP studies covering >10,000 human proteins. We developed a pLLM representation-based cluster-framework followed by consensus analysis to reconcile and augment heterogeneous experimental annotations. Two complementary iterative data expansion protocols were implemented to enhance model performance and generalization. AiPP recovers 80% (Top-1) of all cysteine liganding events from the Protein Data Bank with 78% precision and AUPRC of 84%. Furthermore, it recapitulates the consistently and heterogeneously liganded cysteines analyzed by a recent study using a large number of cancer cell lines. AiPP enables proteome-scale prediction of ligandability, laying the foundation for a comprehensive atlas of protein-ligand interactions and systematic discovery of druggable sites.