Beyond protein functions: evaluating completeness, coherence, and consistency of genome-scale function annotations

Rund Tawfiq
Maxat Kulmanov
Robert Hoehndorf

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Protein function annotation has traditionally followed a reductionist approach, assigning functions to individual proteins acting in isolation. This paradigm treats each annotation as an independent fact, disconnected from the broader biological system. However, proteins operate within integrated cellular networks where their functions depend on genomic context and the presence of interacting partners. Here, we develop a genome-scale evaluation framework that assesses whether annotated protein functions could plausibly coexist within a living organism. We formalize three criteria grounded in systems biology principles: completeness (presence of essential functions), coherence (satisfaction of functional dependencies), and consistency (absence of mutually exclusive functions). Applying this framework to bacterial genomes, we evaluated manually curated annotations from six model organisms and computational predictions from six methods. While model organism annotations largely satisfied our constraints — with violations primarily reflecting host– pathogen interactions — all computational prediction methods systematically failed to produce biologically plausible genome-scale annotations. Methods achieved high accuracy for individual proteins yet produced incomplete metabolic pathways, incoherent protein complexes, and taxonomically impossible function combinations. These results reveal a fundamental disconnect between the reductionist annotation model and the systems-level requirements of biological organisms. Current computational methods amplify this disconnect as they are optimized for protein-level accuracy while ignoring genome-scale constraints. Our framework provides quantitative metrics for evaluating biological plausibility and establishes a foundation for developing system-aware annotation approaches. The shift from reductionist to systems-level perspectives will be essential for annotating the rapidly growing collection of sequenced genomes and metagenomes.

Protein function prediction methods are evaluated by their accuracy on individual proteins, but proteins operate within integrated biological systems with strict functional requirements. We developed a framework that evaluates whether predicted protein functions could plausibly coexist in a living organism by checking for completeness of essential functions, coherence of functional dependencies, and consistency with biological constraints. While manually curated annotations largely satisfy these requirements, all computational prediction methods systematically fail to produce biologically viable genome-scale annotations. This reveals a fundamental disconnect between current evaluation paradigms and the systems-level requirements of biology, highlighting the need for prediction methods that consider genome-scale constraints rather than optimizing for individual protein accuracy.

Version published to 10.1101/2025.07.14.664848 on bioRxiv
Jul 18, 2025

Artificial Intelligence–Driven Structural Mining Enables Functional Inference in the Human Dark Proteome

This article has 7 authors:
1. Valentina Carbonari
2. Annamaria Defilippo
3. Ugo Lomoio
4. Caterina Francesca Perri
5. Barbara Puccio
6. Pierangelo Veltri
7. Pietro Hiram Guzzi
This article has no evaluationsLatest version Dec 23, 2025
A Benchmarking Framework to Catalyze Individual Human Genome Projects

This article has 3 authors:
1. Manjushri kalpande
2. Apoorva Ganesh
3. Subhashini Srinivasan
This article has no evaluationsLatest version Dec 17, 2025
The Deep Core: Mapping the 0.91% Regulatory Backbone of the Human Proteome and Its Role in Cancer Drug Resistance

This article has 1 author:
1. Andres Pirolo
This article has no evaluationsLatest version Feb 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Artificial Intelligence–Driven Structural Mining Enables Functional Inference in the Human Dark Proteome

A Benchmarking Framework to Catalyze Individual Human Genome Projects

The Deep Core: Mapping the 0.91% Regulatory Backbone of the Human Proteome and Its Role in Cancer Drug Resistance