GL-E2EATP: improving protein-ATP binding residue prediction using global and local embedding of protein language model

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Identification of ATP-binding residues in proteins is of paramount importance for elucidating the mechanisms underlying protein functions and advancing drug discovery efforts. Several computational approaches have been developed for predicting ATP binding sites, yet their predictive performance remains suboptimal, primarily due to inadequate features descriptors and learning models. In this study, we developed a novel end-to-end deep learning (DL) model called GL-E2EATP, for predicting ATP-binding residues with improved accuracy. The proposed model use self-supervise learning strategy by extracting both global and local embeddings from protein language model generated by protein sequences. Specifically, we leverage a pre-trained DL-based biological language model, ESM2, to autonomously generate biologically relevant features. Building upon ESM2, two different neural network modules, i.e., convolutional layers and multi-head attention layers, are employed to separately extract global information for whole protein sequences and local information for the potential ATP-binding residues. Empirical evaluations conducted on two independent test datasets reveal that GL-E2EATP outperforms existing ATP-based prediction methods, achieving superior Matthews correlation coefficient (MCC), area under the ROC curve (AUC), and area under the Precision-Recall curve (AUCPR) metrics. Comprehensive analyses anticipate that GL-E2EATP will serve an efficient solution for characterizing large-scale prediction of ATP-binding sites from protein sequences. The standalone package for GL-E2EATP is downloadable at https://github.com/Robin8990/gl-e2eatp for academic use.

Article activity feed