Phylogeny-agnostic strain-level prediction of phage-host interactions from genomes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Bacteriophages offer promising alternatives to antibiotics for treating drug-resistant infections and engineering microbiomes, but applications are limited by inability to select phages infecting specific strains. Selecting suitable phages requires either one-to-one experimental assays or strain-level predictions of phage-host interactions. Existing computational approaches either predict host taxonomy at broad ranks unsuitable for strain-level targeting, or require species-specific mechanistic knowledge limiting generalizability. Here, we present a phylogeny-agnostic machine learning framework predicting strain-level phage-host interactions across diverse bacterial genera from genome sequences alone. Systematically optimizing the workflow over 13.2 million training runs across five datasets (128,357 interactions, 1,058 strains, 560 phages), we achieved performance matching species-specific methods (AUROC 0.67-0.94) while eliminating phylogenetic constraints. Comprehensive feature engineering identifies biologically interpretable genetic determinants while minimizing overfitting in sparse, imbalanced datasets. Experimental validation through 1,328 novel interactions confirmed generalizability (AUROC 0.84), while genome-wide RB-TnSeq screens verified that 68.8% of experimentally identified infection mediators were captured computationally, including receptors and cell wall biosynthesis pathways. Model-guided cocktail design achieved 80.7-97.0% strain coverage with five phages, and up to a 6.2-fold improvement in one-shot phage selection over promiscuity-based selection. This platform enables rational phage therapy design and precision microbiome engineering with applications in combating antimicrobial resistance across clinical, agricultural, and industrial contexts.