A novel approach to Caudoviricetes taxonomy utilising whole proteome structure-structure comparison
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Viral proteins often evolve so rapidly that sequence similarity is lost even among conserved functional genes, complicating phylogenetic analysis and classification. This challenge is acute for Caudoviricetes , a highly diverse class of dsDNA tailed bacteriophages recently redefined by the International Committee on Taxonomy of Viruses (ICTV). To address the limitations of sequence-based taxonomy, we present a structure-based classification framework using whole-proteome structural clustering. From 4,082 exemplar genomes, we predicted 445,098 protein structures with ESMFold and clustered them using Foldseek. Genomes were encoded as binary profiles of structural fold presence, and phylogenetic distances were inferred and normalized using Relative Evolutionary Divergence (RED). This yielded a revised structural taxonomy comprising 159 orders, 267 families, 502 subfamilies, and 1,189 genera. We also introduce PhagePleats , a Python tool for classifying novel phage genomes based on structural similarity. Our approach highlights the utility of protein structure for resolving distant viral relationships.