Synteny-aware functional annotation of bacteriophage genomes with Phynteny
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate genome annotation is fundamental to decoding viral diversity and understanding bacteriophage biology; yet, the majority of bacteriophage genes remain functionally uncharacterised. Bacteriophage genomes often exhibit conserved gene order, or synteny, that reflects underlying constraints in genome architecture and expression. Here, we present Phynteny , a genome-scale, deep learning framework that leverages gene synteny to predict the function of unknown bacteriophage genes. Phynteny integrates protein language model embeddings with positional encoding, bidirectional long short-term memory, and transformer encoders featuring circular attention to learn genome-wide organisational patterns. Trained on a dereplicated dataset of over 280,000 bacteriophage genomes, Phynteny achieves high predictive performance (AUC > 0.84) across the nine PHROG functional categories and confidently assigns putative functions to improve the number of annotated genes in phage isolate genomes by 14%. To assess the validity of these predictions, we compared them with annotations derived independently using protein structural information, revealing broad functional concordance and additional confidence in Phynteny predictions. By incorporating genomic context into functional annotation, Phynteny offers a novel approach to illuminate the functional landscape of viral dark matter and is available at https://github.com/susiegriggo/Phynteny_transformer .