Highly accurate prophage island detection with PIDE
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
As important mobile elements in prokaryotes, prophages shape the genomic context of their hosts and regulate the structure of bacterial populations. However, it is challenging to precisely identify prophages through computational methods. Here, we introduce PIDE for identifying prophages from bacterial genomes or metagenome-assembled genomes. PIDE integrates a pre-trained protein language model and gene density clustering algorithm to distinguish prophages. Benchmarking on bacterial genomes with experimental prophage annotation demonstrates that PIDE pinpoints prophages with precise boundaries. Applying PIDE to 4,744 human gut representative genomes reveals 24,467 prophages with widespread functional capacity. PIDE is open source and is available at https://github.com/chyghy/PIDE .