Context-based protein function prediction in bacterial genomes
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
The rapid growth of sequencing data from high-throughput technologies has emphasized the need to uncover the functions of unannotated genes. Recent advancements in deep learning algorithms have enabled researchers to utilize various features to predict protein functions. Traditionally, these algorithms treat proteins as independent functional units or consider interactions only at the protein level. However, prokaryotes often preserve specific genomic neighborhoods over evolutionary time, providing valuable context for predicting protein functions. This context can arise from genes near the gene of interest or synteny regions, where the conserved order of genes on chromosomes results from common ancestry.
Results
We developed a transformer-based model to pre-train representations of proteins based on their genomic context, and use this model for predicting protein functions. Our results show that context-based protein representations capture context-specific functional semantics and can effectively predict protein functions. We use our model to investigate the influence of phylogenetic distance and homology on the performance of context-dependent function prediction, and find that synteny affects the prediction performance substantially, except for some functions where the function is determined by the genomic context. Our experiments allow us to gain insights into the factors affecting the performance and applicability of context-based function prediction methods across diverse prokaryotic genomes and meta-genomes.
Availability and implementation
The generated model, including all training code and generated data, is freely available at https://github.com/bio-ontology-research-group/Genomic_context .
Contact
robert.hoehndorf@kaust.edu.sa