Explainable protein–protein binding affinity prediction via fine-tuning protein language models

Harshit Singh
Rajeev Kumar Singh
Satya Pratik Srivastava
Suryavedha Pradhan
Rohan Gorantla

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Predicting protein–protein binding affinity from sequence alone remains a bottleneck for anti-body optimization, biologics design and large-scale affinity modelling. Structure-based methods achieve high accuracy but cannot scale when complex structures are unavailable. Here we present a framework that reframes affinity prediction as metric learning: two proteins are projected into a shared latent space in which cosine similarity directly correlates with experimental binding affinity, and the protein language model encoder is adapted through parameter-efficient finetuning (PEFT). On the PPB-Affinity benchmark, the model achieves Pearson r = 0.89 on a random split, generalises to evolutionarily distant proteins ( r = 0.61 at < 30% sequence identity) and surpasses structure-based deep learning baselines across biological subgroups, without any three-dimensional input. On the strictly de-overlapped AB-Bind dataset, few-shot adaptation with 30% of assay data (Pearson r = 0.756, RMSE = 0.688) out-performs methods trained on 90% of data; consistent gains are observed across nine diverse AbBiBench deep-mutational-scanning assays with 10–30% labelled variants. Residue-level explainability reveals that the model concentrates importance on interface-localised residues aligned with experimentally validated interaction hotspots across enzyme–inhibitor, and antibody–antigen systems. Together, these results establish a scalable, explainable and data-efficient route to protein-protein binding affinity prediction and therapeutic antibody optimisation from sequence alone.

Version published to 10.64898/2026.03.30.715237 on bioRxiv
Apr 1, 2026

GL-E2EATP: improving protein-ATP binding residue prediction using global and local embedding of protein language model

This article has 7 authors:
1. Bing Rao
2. Jie Bai
3. Maha A. Thafar
4. Somayah Albaradei
5. Kamran Arshad
6. Apilak Worachartcheewanh
7. Muhammad Arif
This article has no evaluationsLatest version Mar 26, 2026
AINN-P1: A Compact Sequence-Only Protein Language Model Achieves Competitive Fitness Prediction on ProteinGym

This article has 3 authors:
1. Roger Wang
2. Kevin Jin
3. Lurong Pan
This article has no evaluationsLatest version Mar 30, 2026
IDBSpred: An intrinsically disordered binding site predictor using machine learning and protein language model

This article has 2 authors:
1. Drew Jones
2. Yinghao Wu
This article has no evaluationsLatest version Mar 30, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

GL-E2EATP: improving protein-ATP binding residue prediction using global and local embedding of protein language model

AINN-P1: A Compact Sequence-Only Protein Language Model Achieves Competitive Fitness Prediction on ProteinGym

IDBSpred: An intrinsically disordered binding site predictor using machine learning and protein language model