Probe-Based Identification of Metal-Binding Sites Using Deep Learning Representations
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Metalloproteins play indispensable roles in a multitude of cellular processes. They incorporate metal ions as vital cofactors to catalyze biochemical reactions, stabilize protein structures, and mediate electron transfer. Given their prevalence and importance, identifying metal-binding sites within proteins remains a challenging task, due to the intricate complexity of the protein environments and the promiscuous binding behavior of metal ions. Although computational approaches for predicting metal-binding sites have been developed for decades, their performance often suffers from limited accuracy due to constrained methodologies and data scarcity. Here, we introduce PRIME, a hybrid deep learning framework that harnesses both evolutionary and structural signals to predict metal-binding sites with high accuracy and efficiency. PRIME employs protein language models (PLMs) and pre-trained structure models (PSMs), following the paradigm of deep representation learning, to extract essential information of protein sequences and structures. PRIME integrates a novel probe generation algorithm that bridges sequence- and structure-based predictions by efficiently scanning candidate binding sites. The resulting framework achieves superior accuracy across a wide range of metal ions, including both abundant ions such as Zn 2+ and Ca 2+ and less abundant ions such as K + and Na + , surpassing the performance of existing methods. In addition, ablation analysis shows that PSMs significantly enhance the accuracy of metal-binding site prediction. Case studies on AlphaFold2 models, along with the high prediction speed of PRIME, further demonstrate its potential for high-throughput applications in metalloproteomics.