Few-Shot Viral Variant Detection via Bayesian Active Learning and Biophysics
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The early detection of high-fitness viral variants is critical for pandemic response, yet limited experimental resources at the onset of variant emergence hinder effective identification. To address this, we introduce an active learning framework that integrates protein language model ESM3, Gaussian process with uncertainty estimation, and a bio-physical model to predict the fitness of novel variants in a few-shot learning setting. By benchmarking on past SARS-CoV-2 data, we demonstrate that our methods accelerates the identification of high-fitness variants by up to fivefold compared to random sampling while requiring experimental characterization of fewer than 1% of possible variants. We also demonstrate that our framework benchmarked on deep mutational scans effectively identifies sites that are frequently mutated during natural viral evolution with a predictive advantage of up to two years compared to baseline strategies, particularly those enabling antibody escape while preserving ACE2 binding. Through systematic analysis of different acquisition strategies, we show that incorporating uncertainty in variant selection enables broader exploration of the sequence landscape, leading to the discovery of evolutionarily distant but potentially dangerous variants. Our results suggest that this framework could serve as an effective early warning system for identifying concerning SARS-CoV-2 variants and potentially emerging viruses with pandemic potential before they achieve widespread circulation.