From Sequences to Strategies: Early Detection of New SARS-CoV-2 Variants via Genetic Distance to Reduce Hospitalizations
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The COVID-19 pandemic highlighted the critical need for robust methods to monitor viral evolution and detect emerging variants of concern (VOCs). Traditional genomic surveillance often lacks predictive power. This study expanded an unsupervised machine learning clustering algorithm, based on SARS-CoV-2 Spike protein Levenshtein distance, to track and predict variant predominance across six European countries from 2020 to January 2024. We also investigated the influence of genetic distances and containment strategies on hospitalization rates. Sequences were transformed into temporal chains, and growth parameters were extracted via sigmoid fitting. A deep neural network (DNN) was trained to classify emerging chains as likely dominant, while a CatBoost model assessed variable importance for predicting weekly hospitalizations in Denmark. Simulations explored modifying vaccine genetic distance, containment measures, and VCR. Approximately 5,000 sequences per week enabled early chain detection within four weeks. The DNN achieved near-perfect classification of chain predominance within 3-4 weeks of appearance. Genetic distances within consecutive chains and with vaccine strains were significant predictors of hospitalizations. Simulations suggest that better-matched vaccines or stricter containment measures could reduce hospitalizations. Doubling vaccination coverage alone had minimal effect but showed additional reductions when combined with strict containment. This integrated framework demonstrates the utility of combining unsupervised and supervised machine learning for real-time tracking and prediction of SARS-CoV-2 variant dynamics and their impact on public health. Our findings underscore the critical role of genetic distances and effective public health interventions in mitigating the burden of emerging variants, supporting timely genomic surveillance and adaptive public health strategies.