Beyond protein functions: evaluating completeness, coherence, and consistency of genome-scale function annotations

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Protein function annotation has traditionally followed a reductionist approach, assigning functions to individual proteins acting in isolation. This paradigm treats each annotation as an independent fact, disconnected from the broader biological system. However, proteins operate within integrated cellular networks where their functions depend on genomic context and the presence of interacting partners. Here, we develop a genome-scale evaluation framework that assesses whether annotated protein functions could plausibly coexist within a living organism. We formalize three criteria grounded in systems biology principles: completeness (presence of essential functions), coherence (satisfaction of functional dependencies), and consistency (absence of mutually exclusive functions). Applying this framework to bacterial genomes, we evaluated manually curated annotations from six model organisms and computational predictions from six methods. While model organism annotations largely satisfied our constraints — with violations primarily reflecting host– pathogen interactions — all computational prediction methods systematically failed to produce biologically plausible genome-scale annotations. Methods achieved high accuracy for individual proteins yet produced incomplete metabolic pathways, incoherent protein complexes, and taxonomically impossible function combinations. These results reveal a fundamental disconnect between the reductionist annotation model and the systems-level requirements of biological organisms. Current computational methods amplify this disconnect as they are optimized for protein-level accuracy while ignoring genome-scale constraints. Our framework provides quantitative metrics for evaluating biological plausibility and establishes a foundation for developing system-aware annotation approaches. The shift from reductionist to systems-level perspectives will be essential for annotating the rapidly growing collection of sequenced genomes and metagenomes.

Protein function prediction methods are evaluated by their accuracy on individual proteins, but proteins operate within integrated biological systems with strict functional requirements. We developed a framework that evaluates whether predicted protein functions could plausibly coexist in a living organism by checking for completeness of essential functions, coherence of functional dependencies, and consistency with biological constraints. While manually curated annotations largely satisfy these requirements, all computational prediction methods systematically fail to produce biologically viable genome-scale annotations. This reveals a fundamental disconnect between current evaluation paradigms and the systems-level requirements of biology, highlighting the need for prediction methods that consider genome-scale constraints rather than optimizing for individual protein accuracy.

Article activity feed