StoPred: Accurate Stoichiometry Prediction for Protein Complexes Using Protein Language Models and Graph Attention
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Proteins often function as part of complexes, and the specific stoichiometry of these assemblies is critical for their biological roles, but experimental determination of assembly composition remains challenging and existing computational methods for stoichiometry prediction are limited. Existing approaches rely on template-based searches or require predefined stoichiometry for structure prediction, hampering their applicability to proteins without close homologs or known assembly states. Recent advances using protein language models (pLM) have enabled sequence-based prediction of homo-oligomer stoichiometry, but these methods are not applicable to hetero-oligomeric complexes and do not fully leverage inter-subunit relationships. Here, we present StoPred, a method that predicts the stoichiometry of protein complexes by integrating pLM embeddings with a graph attention network to model subunit-level interactions. StoPred infers stoichiometry directly from sequence or structure features for both homo-and hetero-oligomers, without requiring template assemblies or predefined composition. We benchmark StoPred against deep learning-based and template-based methods, and show that it achieves improved accuracy and efficiency across curated and blind datasets, with up to 16% and 41% higher top-1 accuracy for homomeric and heteromeric complexes, respectively, compared to the strongest prior method on our held-out test dataset. More importantly, StoPred is the first deep learning-based method capable of accurately predicting the stoichiometry of hetero-oligomeric complexes.