Context-based protein function prediction in bacterial genomes

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

The rapid growth of sequencing data from high-throughput technologies has emphasized the need to uncover the functions of unannotated genes. Recent advancements in deep learning algorithms have enabled researchers to utilize various features to predict protein functions. Traditionally, these algorithms treat proteins as independent functional units or consider interactions only at the protein level. However, prokaryotes often preserve specific genomic neighborhoods over evolutionary time, providing valuable context for predicting protein functions. This context can arise from genes near the gene of interest or synteny regions, where the conserved order of genes on chromosomes results from common ancestry.

Results

We developed a transformer-based model to pre-train representations of proteins based on their genomic context, and use this model for predicting protein functions. Our results show that context-based protein representations capture context-specific functional semantics and can effectively predict protein functions. We use our model to investigate the influence of phylogenetic distance and homology on the performance of context-dependent function prediction, and find that synteny affects the prediction performance substantially, except for some functions where the function is determined by the genomic context. Our experiments allow us to gain insights into the factors affecting the performance and applicability of context-based function prediction methods across diverse prokaryotic genomes and meta-genomes.

Availability and implementation

The generated model, including all training code and generated data, is freely available at https://github.com/bio-ontology-research-group/Genomic_context .

Contact

robert.hoehndorf@kaust.edu.sa

Article activity feed