Benchmarking protein sequence and structure search methods for remote homology detection

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Protein sequence and structure similarity-based search is an important task, which underpins protein annotation, evolutionary analysis, large-scale functional inference, and the exploration of the protein “dark space”. The rapid growth of sequence and predicted structure databases has spurred diverse search methods, yet their evaluation remains limited to fold-level similarity and inconsistent benchmarking protocols. Results We present a unified benchmark for protein sequence and structure search. Using this framework, we evaluate 13 representative methods spanning sequence alignment, structure alignment, and representation-based approaches across multiple biologically relevant scenarios. Our results show pronounced and context-dependent differences among methods. Structure alignment methods excel at detecting fold-level and geometric similarity, while representation-based searching approaches show advantages in capturing functional similarity under low sequence identity and robustness to predicted structures. Notably, all evaluated methods show limited effectiveness on intrinsically disordered proteins. Conclusions This benchmark establishes a standardized framework for evaluating protein similarity search methods, providing a practical resource for method selection and a foundation for the development of next-generation approaches capable of addressing diverse homology search challenges.

Article activity feed