Variant effect prediction with reliability estimation across priority viruses
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Viruses pose a significant threat to global health due to their rapid evolution, adaptability, and increasing potential for cross-species transmission. While advances in machine learning and the growing availability of sequence and structure data offer promise for large-scale mutation effect prediction, viruses present unique biological and informational constraints that may challenge these models. To quantify this, we introduce EVEREST—a framework for Evolutionary Variant Effect prediction with Reliability ESTimation—which assesses model performance on viral mutational fitness prediction using a curated benchmark of 45 viral deep mutational scanning datasets (over 340 thousand variants) and quantifies model reliability in the absence of experimental data. This large-scale evaluation revealed wide differences in prediction accuracy across models and viral families. Protein language models have reach state-of-the-art performance at many mutation effect tasks, yet their effectiveness for viruses has been unclear despite their increasing deployment. We find that protein language models, trained on diverse sequence corpora, underperform on viruses compared to alignment-based models trained on much smaller sets of homologous sequences. We calibrate reliability estimates against the DMS data, and then apply this framework across 40 WHO-prioritized pandemic-threat viruses (over 400 thousand variants across 16 viral families), discovering that current models fail to reliably predict mutations in over half of these viruses. Our findings uncover key factors leading to underperformance, offer actionable recommendations for improving viral mutation effect prediction, and provide an objective framework for analyzing dual-use biosecurity risk.
