Protein Language Models Capture Structural and Functional Epistasis in a Zero-Shot Setting

Ananthan Nambiar
Sayantani B. Littlefield
Carlos Cuellar
Rohit Khorana
Sergei Maslov

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Protein language models (PLMs) learn from large collections of natural sequences and achieve striking success across prediction tasks, yet it remains unclear what biological principles underlie their representations. We use epistasis, the dependence of a mutation’s effect on its sequence context, as a lens to probe what PLMs capture about proteins. Comparing PLM-derived scores with deep mutational scanning data, we find that epistasis emerges naturally from pretrained models, without supervision on experimental fitness. Raw model scores align with residue–residue contacts, indicating that PLMs internalize structural proximity. Applying a nonlinear transformation to bring model outputs onto the experimental scale, however, shifts the signal toward functional couplings between distant sites. These findings show that PLMs capture both structural and functional dependencies from sequence data alone, and that epistasis provides a powerful window into the biological principles embedded in their representations.

Version published to 10.1101/2025.09.14.676130 on bioRxiv
Sep 17, 2025

Mechanistic evidence that motif-gated domain recognition drives contact prediction in protein language models

This article has 5 authors:
1. Jatin Nainani
2. Bryn Marie Reimer
3. Connor Watts
4. David Jensen
5. Anna G. Green
This article has no evaluationsLatest version Aug 28, 2025
RemoteFoldSet: Benchmarking Structural Awareness of Protein Language Models

This article has 2 authors:
1. Zinnia Ma
2. Neville P. Bethel
This article has no evaluationsLatest version Sep 23, 2025
ProStab: Prediction of protein stability change upon mutations by protein language and inverse folding models

This article has 11 authors:
1. Hong Tan
2. Xiaowei Wei
3. Shenggeng Lin
4. Xueying Mao
5. Junwei Chen
6. Heqi Sun
7. Yufang Zhang
8. Zhenghong Zhou
9. Dong-Qing Wei
10. Shuangjun Lin
11. Yi Xiong
This article has no evaluationsLatest version Aug 15, 2025

Listed in

Abstract

Article activity feed

Related articles

Mechanistic evidence that motif-gated domain recognition drives contact prediction in protein language models

RemoteFoldSet: Benchmarking Structural Awareness of Protein Language Models

ProStab: Prediction of protein stability change upon mutations by protein language and inverse folding models