ProBASS—a language model with sequence and structural features for predicting the effect of mutations on binding affinity

Sagara N S Gurusinghe
Yibing Wu
William DeGrado
Julia M Shifman

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Motivation

Protein–protein interactions (PPIs) govern virtually all cellular processes, and a single mutation within a PPI can significantly impact protein functionality, potentially leading to diseases. While numerous approaches have emerged to predict changes in the free energy of binding due to mutations (ΔΔGbind), most lack precision. Recently, protein language models (PLMs) have shown powerful predictive capabilities by leveraging both sequence and structural data from protein complexes, yet they have not been optimized specifically for ΔΔGbind prediction.

Results

We developed an approach, ProBASS (Protein Binding Affinity from Structure and Sequence), to predict the effects of mutations on ΔΔGbind using two most advanced PLMs, ESM2 and ESM-IF1, which incorporate sequence and structural features, respectively. We first generated embeddings for each PPI mutant from the two PLMs and then fine-tuned ProBASS by training on a large dataset of experimental ΔΔGbind values. When training and testing were done on the same PPI, ProBASS achieved correlations with experimental ΔΔGbind values of 0.83 ± 0.05 and 0.69 ± 0.04 for single and double mutations, respectively. Additionally, when evaluated on a dataset of 2,325 single mutations across 131 PPIs, ProBASS reached a correlation of 0.81 ± 0.02, substantially outperforming other PLMs in predictive accuracy. Our results demonstrate that refining pre-trained PLMs with extensive ΔΔGbind datasets across multiple PPIs is a successful approach for creating a precise and broadly applicable ΔΔGbind prediction model, facilitating future protein engineering and design studies. ProBASS’s accuracy could be further improved through training as more experimental data becomes available.

Availability and implementation

ProBASS is available at: https://colab.research.google.com/github/sagagugit/ProBASS/blob/main/ProBASS.ipynb.

Version published to 10.1093/bioinformatics/btaf270
May 1, 2025
Version published to 10.1101/2024.06.21.600041 on bioRxiv
Jun 26, 2024

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026
The Evolution of the AlphaFold Architecture

This article has 1 author:
1. Y.C.B.J. Dissanayaka
This article has no evaluationsLatest version Jan 9, 2026
Emergence of Biological Structural Discovery in General-Purpose Language Models

This article has 1 author:
1. Liang Wang
This article has no evaluationsLatest version Jan 8, 2026

Discuss this preprint

Listed in

Abstract

Motivation

Results

Availability and implementation

Article activity feed

Related articles

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

The Evolution of the AlphaFold Architecture

Emergence of Biological Structural Discovery in General-Purpose Language Models