Evaluating variant effect prediction across viruses

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Viruses are a major threat to global health due to their rapid evolution, extensive diversity, and frequent cross-species transmission. Although advances in machine learning and the expanding availability of sequence and structural data have accelerated large-scale mutation effect prediction, viral proteins, and particularly fast-evolving antigenic proteins, pose unique biological and data-related challenges that may limit model performance. We introduce EVEREST, a curated dataset for evaluating model performance on (i) forecasting real-world viral evolution (31 clades across 4 viruses) and (ii) concordance with lab-based deep mutational scanning assays (45 proteins, $>$340,000 variants). Using EVEREST, we show that state-of-the-art protein language models trained across the protein universe substantially under-perform on viral proteins relative to alignment-based models trained on homologous proteins. This under-performance persists even in low-sequence regimes, as is the case during a novel viral outbreak. We develop calibrated reliability metrics to quantify confidence in model predictions where no evaluation datasets exist. For more than half of the WHO-prioritized pandemic-threat viruses, current models fail to produce reliable predictions, highlighting the urgent need for more data or new modeling approaches. Together, these findings reveal key factors driving model under-performance and provide actionable recommendations for improving viral mutation effect prediction in preparation for current and future outbreaks.

Article activity feed