ProtAttn-QuadNet: An attention-based deep learning framework for protein–protein interaction prediction using ProtBERT embeddings
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Protein–protein interactions (PPIs) form the backbone of most cellular processes, governing signal transduction, gene regulation, and metabolic control. However, experimental approaches to identifying PPIs remain expensive, laborious, and often incomplete. Recent advances in protein language models (PLMs) have transformed sequence-based PPI prediction by enabling deep contextual encoding of biochemical and structural information directly from amino acid sequences. Building upon this progress, we present ProtAttn-QuadNet, an attention-based deep learning framework that leverages ProtBERT embeddings to model reciprocal dependencies between protein pairs. The proposed model employs a quad-stream attention mechanism that integrates individual protein features, synergistic interactions, and complementary differences through multi-level self- and cross-attention layers. This architecture enables the discovery of fine-grained relational patterns while ensuring balanced bidirectional modeling of interacting proteins. Evaluated on large-scale dataset from UniProt, ProtAttn-QuadNet achieves 97.16% accuracy (AUC-ROC 99.00%) on balanced data and 99.19% accuracy (AUC-ROC 99.76%) on oversampled datasets, surpassing several recent state-of-the-art PPI prediction methods. Statistical validation using the Chi-square and Wilcoxon signed-rank tests confirms the model’s predictive significance and reliability. ProtAttn-QuadNet offers a powerful computational framework for large-scale PPI prediction.
Author summary
Proteins work together to carry out almost every function in living cells, from sending signals to controlling metabolism. Knowing which proteins interact with each other helps scientists understand how cells work and how diseases develop. However, finding these interactions in the laboratory is often slow, costly, and incomplete. In this study, a computational model called ProtAttn-QuadNet is developed to predict protein–protein interactions using only the amino acid sequences of proteins. The model analyses each pair of proteins to find shared features and differences that indicate whether they interact. It also uses a set of attention layers that allow the model to focus on the most relevant sequence patterns. When tested on a large protein dataset, ProtAttn-QuadNet produced highly accurate and consistent results, performing better than several existing methods. These results suggest that ProtAttn-QuadNet can serve as a reliable tool for studying protein networks and may help guide future research in biology, medicine, and drug development.