Evolutionary constraints improve protein large language model predictions for protein stability, binding regions and epistasis

Konstantina Tzavella
Catharina Olsen
Wim Vranken

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Our understanding of protein function and evolution is largely based on the relationship between amino acid sequence and overall fold, now effectively captured by computational models. Yet predicting how mutations—shaped by epistasis—alter protein behavior, especially in dynamic or structurally ambiguous regions, remains difficult. Here we present D2D, which combines a self-supervised protein language model with protein-specific evolutionary information to predict mutational effects using little to no task-specific labeled data. D2D captures long-range epistatic interactions, accurately predicts single and higher-order mutation effects on protein thermostability and binding, without being trained on the task. When fine-tuned, D2D outperforms state-of-the-art methods on latent driver cancer mutations and co-occurring proliferation-enhancing mutations across independent experimental studies. Unlike most existing approaches, D2D avoids biases linked to solvent accessibility or to multiple sequence alignment depth and quality, making it particularly effective for disordered or surface binding regions where structure-based predictors typically falter. Overall, D2D provides a general framework for modeling mutational effects in proteins with limited experimental or structural information.

Version published to 10.64898/2026.05.22.726784 on bioRxiv
May 26, 2026

Just Add Structure: Protein Language Models Combined with Structural Equivariance Excel at Protein Tasks

This article has 5 authors:
1. Qurat-ul-ain
2. Carlos Outeiral
3. Matteo Cagiada
4. Yee Whye Teh
5. Charlotte M. Deane
This article has no evaluationsLatest version May 29, 2026
Cross-Attention Over RNA And Protein Sequences Enables Generalizable Interaction Prediction

This article has 7 authors:
1. Mario Catalano
2. Gerardo Pepe
3. Gabriele Ausiello
4. Claire McWhite
5. Giorgio Gambosi
6. Manuela Helmer Citterich
7. Pier Federico Gherardini
This article has no evaluationsLatest version Apr 23, 2026
Discriminative Site-Directed Protein Engineering via Lightweight CASPE Platform

This article has 10 authors:
1. Qiufeng Deng
2. Jie Qiao
3. Chuan Wang
4. Xinyue Ni
5. Yongyao Chang
6. Nan Zhao
7. Rui Zhai
8. Haiyang Cui
9. Xiujuan Li
10. Mingjie Jin
This article has no evaluationsLatest version Apr 28, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Just Add Structure: Protein Language Models Combined with Structural Equivariance Excel at Protein Tasks

Cross-Attention Over RNA And Protein Sequences Enables Generalizable Interaction Prediction

Discriminative Site-Directed Protein Engineering via Lightweight CASPE Platform