The hidden prevalence and unique protein folds of huge phages

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Huge phages are remarkable for their expansive genomes (≥200 kbp, and up to 841 kbp to date), global distribution, and intricate functional repertoires. However, their study has been hindered by challenges in genome identification and the lack of a comprehensive reference database. Using vRep, we established the Huge Phage Genome Collection (HPGC), a curated resource of 7,295 non-redundant huge phage genomes spanning 4,613 species-level groups. Within HPGC, we identified an overlooked and prevalent clade, HP8026, which constitutes ~1.2% of human gut metagenomic reads, suggesting its potential in fecal contamination monitoring and a promissing modulator of gut microbiome dynamics. Complementing HPGC, we developed the Huge Phage Protein Structure Database (HPDB), which catalogs 15,594 structure clusters derived from 589,960 predicted proteomes. Notably, 40% of these clusters presented in huge phages only, exhibit specific protein folds. Additionally, 3% of the HPDB clusters were absent in other viral forms, including the plasmid-originated ArdC that could protect huge phages from the antiviral mechanisms of their hosts. Collectively, HPGC and HPDB provide a pioneering framework for exploring the evolution, functional innovation, and biotechnological potential of huge phages, while serving as indispensable references for viral ecology and structural biology.

Article activity feed