Enabling the prediction of phage receptor specificity from genome data
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Arcadia Science)
Abstract
Predicting which receptor a phage binds to from genome sequence alone has remained an intractable challenge, principally because the experimental phenotypic data required to train and validate predictive models have not been available at sufficient scale. Here we address this by conducting 1,050 genome-wide genetic screens across 255 taxonomically diverse Escherichia coli dsDNA phages, assigning host receptors to 193 phages across 19 receptor classes. Comparative genomics and AlphaFold3 structural modelling resolved the sequence determinants of specificity to defined receptor-binding protein domains and individual residues. Machine learning models trained on this dataset predicted host receptor identity from phage genome sequence alone without prior annotation of receptor-binding genes, achieving perfect precision and greater than 80% recall on 49 independently validated phages, and yielding predictions for 1,060 of 1,875 E. coli phage genomes in NCBI. Domain swaps redirected receptor specificity as predicted, and a single amino acid substitution proved both necessary and sufficient to switch recognition between two distinct porins. These results demonstrate that systematic phenotyping at scale makes sequence-based prediction of molecular interaction specificity tractable, with direct implications for phage-based medicine, microbiome engineering and the broader challenge of inferring host-pathogen interaction outcomes from sequence.
Article activity feed
-
e.
This is cool. Something I'm curious about -- now that you've done this training, can you back-calculate the minimum number and diversity of experimental screens that one might need to conduct to achieve similar results? That could be super useful guidance for others studying phage-host pairings and predictions in other species, so that they set those up well from an economic standpoint. And then that could open up way more G-P atlases across other bacteria-phage systems.
-
T
These are great resources. Thank you so much for building these for the community
-