GenSPARC: Generalized Structure- and Property-Aware Representations of Language Models for Compound-Protein Interaction Prediction
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Compound-protein interaction (CPI) prediction plays a crucial role in drug discovery by aiding the identification of binding and affinities between small molecules and proteins. Current deep learning models rely heavily on sequence-based representations and suffer from a lack of labeled data, which restricts their accuracy and generalizability. To overcome these challenges, we propose GenSPARC ( Gen eralized S tructure and P roperty A ware R epresentation for C PI prediction), a deep learning model that leverages structure-aware protein representations derived from AlphaFold2 predictions and Foldseek’s 3D interaction alphabet. Compound features were extracted using graph convolutional networks and a pretrained chemical language model, thereby ensuring comprehensive multimodal representation. A novel attention mechanism further enhanced interaction modeling by capturing intricate binding patterns. GenSPARC was validated successfully with multiple CPI benchmark datasets, demonstrating strong generalizability across challenging data splits and competitive results in virtual screening tasks. Therefore, GenSPARC will substantially advance artificial intelligence-driven drug discovery.