Allo-PED: Leveraging protein language models and structure features for allosteric site prediction
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Allosteric regulation plays a pivotal role in modulating protein function and allosteric sites represent a promising target for drug discovery. However, identifying allosteric sites remains challenging due to their structural and evolutionary diversity. Here, we present AlloPED, a novel framework that combines protein language models and machine learning to predict allosteric sites with high accuracy. AlloPED consists of two modules: AlloPED-pocket, an ensemble model leveraging physicochemical features to predict allosteric pockets; and AlloPED-site, a dilated convolutional neural network (DCNN) augmented with a comprehensive attention mechanism for residue-level prediction. AlloPED-pocket achieves state-of-the-art performance on benchmark datasets, yielding an MCC of 0.544 and an AUC of 0.920, outperforming existing methods such as AllositePro and PARS. AlloPED-site further refines predictions using high-dimensional sequence embeddings from the ProtT5 protein language model, achieving a precision of 0.601, a recall of 0.422, and a specificity of 0.661. These results highlight the effectiveness of integrating ensemble learning and deep learning for allosteric site prediction. AlloPED also identifies critical determinants of allosteric sites, including residue clustering coefficients, van der Waals volume, and hydrophobic microenvironments. In summary, this framework provides a robust tool for advancing our understanding of allosteric regulation and facilitating structure-based drug design.