Systematic assessment of homology-based methods for fine-grained functional annotation using diverse protein families

Rakesh Busi
Pranav Machingal
Nandyala Hemachandra
Petety V. Balaji

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The size of the protein sequence database is increasing without a consequent increase in the number of proteins with known molecular function, especially at the fine-grained level. Alignment-based approaches such as BLAST and profile hidden Markov models (HMMs) are widely used to infer homology and transfer annotation to be subsequently confirmed by experiments. The ability of BLASTp to distinguish orthologs from paralogs varies across protein families; for profile HMMs, this depends on the sequences considered for generating multiple sequence alignments. In this study, we systematically evaluated the performance of BLASTp and HMM-based methods for fine-grained function annotation using carefully curated protein datasets that are diverse in sequence-structure-function relationships. Expectedly, BLASTp performed well in detecting close homologs but failed to detect remote homologs. BLASTp detected homology between 22.6% and 100% of sequence pairs within different homologous protein families. The extent of sequence identity between trypsin and chymotrypsin sequences is high despite differences in fine-grained molecular function. Transferring function annotation based on homology inferred from BLASTp leads to errors in trypsin-chymotrypsin-like situations.

Profile HMMs improved sensitivity and captured subtle homology signals even when sequence identity was low, though some known family members scored below threshold due to functional divergence or mutations at catalytic sites. We further showed that relying solely on homology for annotation transfer can lead to misleading conclusions when proteins have evolved divergent functions despite structural similarity. Our findings highlight that a cautious approach involving BLASTp, profile HMMs, and expert domain knowledge provides the most reliable strategy for functional annotation. After all, not every family member may be doing what we think they are doing.

Version published to 10.1101/2025.09.17.676781 on bioRxiv
Sep 20, 2025

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

This article has 8 authors:
1. Louis-Maël Guéguen
2. Alban Mathieu
3. Simon Pelletier
4. Anthony Woo
5. Namita Misra
6. Magali Moreau
7. Olivier Perin
8. Arnaud Droit
This article has no evaluationsLatest version Jan 29, 2026
The Evolution of the AlphaFold Architecture

This article has 1 author:
1. Y.C.B.J. Dissanayaka
This article has no evaluationsLatest version Jan 9, 2026
Artificial Intelligence–Driven Structural Mining Enables Functional Inference in the Human Dark Proteome

This article has 7 authors:
1. Valentina Carbonari
2. Annamaria Defilippo
3. Ugo Lomoio
4. Caterina Francesca Perri
5. Barbara Puccio
6. Pierangelo Veltri
7. Pietro Hiram Guzzi
This article has no evaluationsLatest version Dec 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

The Evolution of the AlphaFold Architecture

Artificial Intelligence–Driven Structural Mining Enables Functional Inference in the Human Dark Proteome