Understanding Protein Language Model Scaling on Mutation Effect Prediction

Chao Hou
Di Liu
Aziz Zafar
Yufeng Shen

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Protein language models (pLMs) can predict mutation effects by computing log-likelihood ratios between mutant and wild-type amino acids, but larger models do not always perform better. We found that the performance of ESM2 peaks when the predicted perplexity for a given protein falls within the range of 3–6. Models that yield excessively high or low perplexity tend to predict uniformly near-zero or large negative log-likelihood ratios for all mutations on the protein, limiting their ability to discriminate between deleterious and neutral mutations. Larger models often assign uniformly high probabilities across all positions, reducing specificity for functionally important residues. We also demonstrated how the evolutionary information implicitly captured by pLMs can be linked with the conservation patterns observed in homologous sequences. Our findings highlight the importance of perplexity in mutation effect prediction and suggest a direction for developing pLMs optimized for this application.

Version published to 10.1101/2025.04.25.650688v1 on bioRxiv
Apr 29, 2025

Protein Electrostatic Properties are Fine-Tuned Through Evolution

This article has 3 authors:
1. Mingzhe Shen
2. Guy W. Dayhoff
3. Jana Shen
This article has no evaluationsLatest version Apr 21, 2025
Assessing scoring metrics for AlphaFold2 and AlphaFold3 protein complex predictions

This article has 3 authors:
1. Luca R. Genz
2. Sanjana Nair
3. Maya Topf
This article has no evaluationsLatest version Apr 17, 2025
Functional alignment of protein language models via reinforcement learning

This article has 6 authors:
1. Nathaniel Blalock
2. Srinath Seshadri
3. Agrim Babbar
4. Sarah A Fahlberg
5. Ameya Kulkarni
6. Philip A Romero
This article has no evaluationsLatest version May 8, 2025

Listed in

Abstract

Article activity feed

Related articles

Protein Electrostatic Properties are Fine-Tuned Through Evolution

Assessing scoring metrics for AlphaFold2 and AlphaFold3 protein complex predictions

Functional alignment of protein language models via reinforcement learning