From Likelihood to Fitness: Improving Variant Effect Prediction in Protein and Genome Language Models

Charles W. J. Pugh
Paulina G. Nuñez-Valencia
Mafalda Dias
Jonathan Frazer

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (Arcadia Science)

Abstract

Generative models trained on natural sequences are increasingly used to predict the effects of genetic variation, enabling progress in therapeutic design, disease risk prediction, and synthetic biology. In the zero-shot setting, variant impact is estimated by comparing the likelihoods of sequences, under the assumption that likelihood serves as a proxy for fitness. However, this assumption often breaks down in practice: sequence likelihood reflects not only evolutionary fitness constraints, but also phylogenetic structure and sampling biases, especially as model capacity increases. We introduce Likelihood-Fitness Bridging (LFB), a simple and general strategy that improves variant effect prediction by averaging model scores across sequences subject to similar selective pressures. Assuming an Ornstein-Uhlenbeck model of evolution, LFB can be viewed as a way to marginalize the effects of genetic drift, although its benefits appear to extend more broadly. LFB applies to existing protein and genomic language models without requiring retraining, and incurs only modest computational overhead. Evaluated on large-scale deep mutational scans and clinical benchmarks, LFB consistently improves predictive performance across model families and sizes. Notably, it reverses the performance plateau observed in larger protein language models, making the largest models the most accurate when combined with LFB. These results suggest that accounting for phylogenetic and sampling biases is essential to realizing the full potential of large sequence models in variant effect prediction.

Arcadia Science
May 30, 2025

We found a simple minimum percentage identity threshold of 30% performed best

The figure reports an average but my gut feeling is that this threshold should maybe be protein/protein family specific. I imagine the overall shape of the phylogeny/distribution of branch lengths around a focal protein will influence how much predictive gain LFB provides. For example, it might make sense to set this threshold higher for a protein with lots of intermediate divergence homologs, vs one that has few. An explicit analysis of what features of a protein family's phylogeny favour differing thresholds might in itself be a very useful analysis for guiding the application of LFB/LFB-like methods to PLM improvement.

Read the original source
Arcadia Science
May 30, 2025

The LFB estimators proposed in this work are intentionally simple and serve as a starting point for more sophisticated inference strategies

One alternative/complementary 'modify the model' strategy that might be useful to compare to this method is protein family specific PLM fine-tuning. One could test how fine-tuning on a much narrower region of protein space affects a PLM's ability to soak up phylogenetic signal by testing if a fine-tuned model is similarly improved with LFB in zero-shot fitness prediction tasks.

Read the original source
Arcadia Science
May 30, 2025

Sketch proof of lower variance under OU model

In addition to this theory calculation, would it be possible with your data to look at empirical variance and that it behaves as expected?

Read the original source
Version published to 10.1101/2025.05.20.655154v1 on bioRxiv
May 24, 2025

Understanding Protein Language Model Scaling on Mutation Effect Prediction

This article has 4 authors:
1. Chao Hou
2. Di Liu
3. Aziz Zafar
4. Yufeng Shen
This article has no evaluationsLatest version Apr 29, 2025
“Frustratingly easy” domain adaptation for cross-species transcription factor binding prediction

This article has 5 authors:
1. Mark Maher Ebeid
2. Ali Tuğrul Balcı
3. Maria Chikina
4. Panayiotis V Benos
5. Dennis Kostka
This article has no evaluationsLatest version May 26, 2025
Functional alignment of protein language models via reinforcement learning

This article has 6 authors:
1. Nathaniel Blalock
2. Srinath Seshadri
3. Agrim Babbar
4. Sarah A Fahlberg
5. Ameya Kulkarni
6. Philip A Romero
This article has no evaluationsLatest version May 8, 2025

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Related articles

Understanding Protein Language Model Scaling on Mutation Effect Prediction

“Frustratingly easy” domain adaptation for cross-species transcription factor binding prediction

Functional alignment of protein language models via reinforcement learning