Evaluating variant effect prediction across viruses
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Viruses are a major threat to global health due to their rapid evolution, extensive diversity, and frequent cross-species transmission. Although advances in machine learning and the expanding availability of sequence and structural data have accelerated large-scale mutation effect prediction, viral proteins, and particularly fast-evolving antigenic proteins, pose unique biological and data-related challenges that may limit model performance. We introduce EVEREST, a curated dataset for evaluating model performance on (i) forecasting real-world viral evolution (31 clades across 4 viruses) and (ii) concordance with lab-based deep mutational scanning assays (45 proteins, $>$340,000 variants). Using EVEREST, we show that state-of-the-art protein language models trained across the protein universe substantially under-perform on viral proteins relative to alignment-based models trained on homologous proteins. This under-performance persists even in low-sequence regimes, as is the case during a novel viral outbreak. We develop calibrated reliability metrics to quantify confidence in model predictions where no evaluation datasets exist. For more than half of the WHO-prioritized pandemic-threat viruses, current models fail to produce reliable predictions, highlighting the urgent need for more data or new modeling approaches. Together, these findings reveal key factors driving model under-performance and provide actionable recommendations for improving viral mutation effect prediction in preparation for current and future outbreaks.