Variant effect prediction with reliability estimation across priority viruses

Sarah Gurev
Noor Youssef
Navami Jain
Debora S. Marks

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Viruses pose a significant threat to global health due to their rapid evolution, adaptability, and increasing potential for cross-species transmission. While advances in machine learning and the growing availability of sequence and structure data offer promise for large-scale mutation effect prediction, viruses present unique biological and informational constraints that may challenge these models. To quantify this, we introduce EVEREST—a framework for Evolutionary Variant Effect prediction with Reliability ESTimation—which assesses model performance on viral mutational fitness prediction using a curated benchmark of 45 viral deep mutational scanning datasets (over 340 thousand variants) and quantifies model reliability in the absence of experimental data. This large-scale evaluation revealed wide differences in prediction accuracy across models and viral families. Protein language models have reach state-of-the-art performance at many mutation effect tasks, yet their effectiveness for viruses has been unclear despite their increasing deployment. We find that protein language models, trained on diverse sequence corpora, underperform on viruses compared to alignment-based models trained on much smaller sets of homologous sequences. We calibrate reliability estimates against the DMS data, and then apply this framework across 40 WHO-prioritized pandemic-threat viruses (over 400 thousand variants across 16 viral families), discovering that current models fail to reliably predict mutations in over half of these viruses. Our findings uncover key factors leading to underperformance, offer actionable recommendations for improving viral mutation effect prediction, and provide an objective framework for analyzing dual-use biosecurity risk.

Version published to 10.1101/2025.08.04.668549 on bioRxiv
Aug 15, 2025

A structure-informed evolutionary model for predicting viral immune escape and evolution

This article has 2 authors:
1. Chonghao Wang
2. Lu Zhang
This article has no evaluationsLatest version Jul 31, 2025
ProStab: Prediction of protein stability change upon mutations by protein language and inverse folding models

This article has 11 authors:
1. Hong Tan
2. Xiaowei Wei
3. Shenggeng Lin
4. Xueying Mao
5. Junwei Chen
6. Heqi Sun
7. Yufang Zhang
8. Zhenghong Zhou
9. Dong-Qing Wei
10. Shuangjun Lin
11. Yi Xiong
This article has no evaluationsLatest version Aug 15, 2025
Benchmarking DNA Foundation Models for zero-shot variant effect prediction: the role of context, training, and architecture

This article has 4 authors:
1. Ilaria Alfisi
2. Francesca Ciapi
3. Marta Baragli
4. Alberto Magi
This article has no evaluationsLatest version Aug 5, 2025

Listed in

Abstract

Article activity feed

Related articles

A structure-informed evolutionary model for predicting viral immune escape and evolution

ProStab: Prediction of protein stability change upon mutations by protein language and inverse folding models

Benchmarking DNA Foundation Models for zero-shot variant effect prediction: the role of context, training, and architecture