Empathi: Embedding-based Phage Protein Annotation Tool by Hierarchical Assignment
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Bacteriophages, viruses infecting bacteria, are estimated to outnumber their cellular hosts by 10-fold, acting as key players in all microbial ecosystems. Under evolutionary pressure by their host, they evolve rapidly and encode a large diversity of protein sequences. Consequently, the majority of functions carried by phage proteins remain elusive. Current tools to comprehensively identify phage protein functions from their sequence either lack sensitivity (those relying on homology for instance) or specificity (assigning a single coarse grain function to a protein). Here, we introduce Empathi, a protein-embedding-based classifier that assigns functions in a hierarchical manner – from general functional categories such as “structural” and “DNA-associated” proteins to more specific ones including “nucleases”, “tail appendages” and “endolysins” to name only a few. These categories were specifically tailored for phage protein functions and organized such that molecular-level functions are respected in each category, making it well suited for training machine learning classifiers based on protein embeddings. We show on a dataset of cultured phage genomes that Empathi significantly outperforms homology-based methods, tripling the number of annotated homologous groups. On the EnVhog database, the most recent and extensive database of metagenomically-sourced phage proteins, Empathi doubled the annotated fraction of protein families from 16% to 33%. On complete genomes taken from new viromes, almost twice as many proteins are annotated using our method, predictions are consistent when compared to existing tools and Empathi predictions are highly colocalized. In addition, by leveraging Empathi’s ability to assign multiple labels to the same protein, it is possible to identify multifunctional proteins such as virion-associated lysins. Having a more global view of the repertoire of functions a phage possesses will assuredly help to understand them and their interactions with bacteria better.