Bacteriophage genomics: What has five years of INPHARED taught us?
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Bacteriophages are key drivers of microbial ecology and evolution, and the rapid expansion of phage sequencing has created sustained demand for curated reference genome databases. We released the INfrastructure for a PHAge REference Database (INPHARED) in January 2021 to provide quality-controlled metadata for complete phage genomes from cultured isolates. Here, we compare the 2021 and 2026 snapshots, spanning a five-year period that included a substantial overhaul of bacterial virus taxonomy by the ICTV. The database has approximately doubled, from 14,244 to 28,777 genomes, yet the proportion representing novel species-level diversity has declined, indicating that redundant sequencing is outpacing new discovery. Host bias persists despite the addition of 97 new host genera. We have incorporated genome quality assessments, lifestyle predictions, and defence and anti-defence system annotations, providing an updated resource and a snapshot of the current state of phage genomics.