Rapid Phylogenomic Analysis of Thousands Outbreak‐Causing Viral Genomes Using Covary
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Rapid phylogenomic analysis is essential for outbreak surveillance and large-scale viral comparative genomics, yet conventional alignment-based workflows remain computationally intensive and difficult to deploy at scale. Covary is a computational framework designed for large-scale biological sequence analysis. It is a translation-aware, alignment-free machine learning framework that encodes genomic information into biologically informed vector representations, enabling efficient genome-scale comparison without multiple sequence alignment (MSA). Here, Covary was applied to thousands-scale analysis of outbreak-causing viral genomes to assess its scalability and biological resolution. A total of 4,000 complete genomes of SARS-CoV-2, dengue virus, measles virus, and alphainfluenza virus were retrieved from the NCBI Viral Genomes Resource, of which 3,831 passed quality filtering and were analyzed using Covary. Results showed that Covary rapidly processed all genomes and consistently grouped sequences according to expected taxonomic assignments and known ingroup structure, including SARS-CoV-2 Pango lineages, dengue virus subtypes, measles virus geographic origin, and alphainfluenza virus clades. Covary completed the analysis in 45 minutes on free-tier Google Colab, inferring genome-wide relationships using modest computational resources. These results demonstrate that Covary enables rapid, alignment-free phylogenomic analysis of thousands of outbreak-causing viral genomes without requiring advanced computational infrastructure. In conclusion, Covary represents a scalable, deploy-ready machine learning pipeline for genome-informed outbreak surveillance and monitoring systems.