V- and VL-Scores Uncover Viral Signatures and Origins of Protein Families
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Viruses are key drivers of microbial diversity, nutrient cycling, and co-evolution in ecosystems, yet their study is hindered due to challenges in culturing. Traditional gene-centric methods, which focus on a few hallmark genes like for capsids, miss much of the viral genome, leaving key viral proteins and functions undiscovered. Here, we introduce two powerful annotation-free metrics, V-score and VL-score, designed to quantify the virus-likeness of protein families and genomes and create an open-access searchable database, V-Score-Search. By applying V- and VL-scores to public databases (KEGG, Pfam, and eggNOG), we link 38−77% of protein families with viruses, a 9−16x increase over current estimates. These metrics outperform existing approaches, enabling precise detection of viral genomes, prophages, and host-derived auxiliary viral genes (AVGs) from fragmented sequences, and significantly improving genome binning. Remarkably, we identify up to 17x more AVGs, dominated by non-metabolic proteins of unknown function. This innovation unlocks new insights into virus signatures and host interactions, with wide-ranging implications from genomics to biotechnology.