ProSiteHunter: A unified framework for sequence-based prediction of protein-nucleic acid and protein-protein binding sites

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate prediction of protein binding sites is essential for elucidating protein function, understanding molecular interaction mechanisms, and facilitating drug design. However, existing sequence-based approaches are often designed for specific binding-site types and therefore lack generality, whereas structure-based methods typically rely on high-quality structural models, limiting their applicability. Here, we introduce ProSiteHunter, a unified sequence-based framework for protein binding-site prediction, which integrates a fine-tuned protein language model (SiteT5) with a multi-source feature-fusion network that incorporates evolutionary, geometric, and statistical features, while employing bidirectional semantics, local associations, and global dependencies for comprehensive binding-site characterization. The method was systematically evaluated on diverse binding sites prediction tasks, where ProSiteHunter achieved a 39.1% average improvement in PRAUC for protein-DNA/RNA/protein tasks and a 7.4% PRAUC enhancement on the particularly challenging antibody-antigen task over state-of-the-art methods. Moreover, ProSiteHunter is capable of identifying local flexible sites that complement AlphaFold3 predictions and improving the accuracy of antibody-antigen interaction prediction. These results highlight ProSiteHunter as an efficient and unified approach for accurate and robust prediction of diverse protein binding sites.

Article activity feed