Sequence-based Drug-Target Complex Pre-training Enhances Protein-Ligand Binding Process Predictions Tackling Crypticity
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Predicting protein-ligand binding processes, such as affinity and kinetics, is critical for accelerating drug discovery. However, existing computational methods are constrained by several key limitations: insufficient integration of comprehensive databases, inadequate representation of protein structural dynamics, and incomplete modeling of microscale protein-ligand interactions. To address these challenges, we introduce ProMoSite, a sequence-based model that connects protein and molecular foundation models to dynamically model microscale protein-ligand interactions through pre-training on binding site annotations in drug-target complexes. Notably, ProMoSite surpasses or matches state-of-the-art methods in identifying exposed and cryptic binding sites, while eliminating the need for 3-dimensional structural inputs. Building upon ProMoSite's pre-training, we developed ProMoBind, a sequence-based model efficiently fine-tuned for protein-ligand affinity and kinetics predictions, which leverages the predicted binding configurations from ProMoSite and its ability to tackle binding site crypticity. ProMoBind outperforms baselines across both tasks with high computational efficiency, demonstrating its effectiveness. The powerful modeling capabilities of ProMoSite, combined with ProMoBind's downstream success, highlight the potential of this sequence-based pre-training and fine-tuning framework for broad applications in drug discovery.