Learning Binding Affinities via Fine-tuning of Protein and Ligand Language Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate in-silico prediction of protein-ligand binding affinity is essential for efficient hit identification in large molecular libraries. Commonly used structure-based methods such as giga-docking often fail to rank compounds effectively, and free energy-based approaches, while accurate, are too computationally intensive for large-scale screening. Existing deep learning models struggle to generalize to new targets or drugs, and current evaluation methods do not reflect real-world performance accurately. We introduce BALM , a deep learning framework that predicts b inding a ffinity using pretrained protein and ligand l anguage m odels. BALM learns experimental binding affinities by optimizing cosine similarity in a shared embedding space. We also propose improved evaluation strategies with diverse data splits and metrics to better assess model performance. Using the BindingDB dataset, BALM shows strong generalization to unseen drugs, scaffolds, and targets. It excels in few-shot scenarios for targets such as USP7 and Mpro , outperforming traditional machine learning and docking methods, including AutoDock Vina. Adoption of the target-based evaluation methods proposed will allow for more stringent evaluation of machine learning-based scoring tools. Frameworks such as BALM show good performance, are computationally efficient, and are highly adaptable within this evaluation setting, making them practical tools for early-stage drug discovery screening.