Evolutionary and geometric signatures reveal ligand-binding sites across proteomes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Identifying protein binding sites is central to drug discovery, yet many computational approaches still trade off precision, recall, or throughput when scaled. We introduce PickPocket, a deep learning model that fuses evolutionary information from protein sequences with geometric representations of structure to identify ligand-binding residues at proteome scale. By leveraging complementary sequence context and spatial neighborhoods, PickPocket generalizes across diverse protein families and ligand chemistries while operating at a recall-oriented setting with competitive precision. In benchmark evaluations it delivers strong residue-level recovery and, despite no explicit training on conformational switching, reliably identifies cryptic pockets in held-out structures, comparing favorably with specialized approaches. Applied across 356,711 proteins, the method nominates previously unannotated candidate sites enriched for functional signals and highlights tractable surface chemistry on therapeutically relevant targets. These results position evolutionary-geometric fusion as a practical foundation for large-scale site mapping that can shorten the path from structure to experiment and support hit discovery, mutagenesis design, and target assessment.