Ligand Binding Prediction Using Protein Structure Graphs and Residual Graph Attention Networks

Abstract

Computational prediction of ligand–target interactions is a crucial part of modern drug discovery as it helps to bypass high costs and labor demands of in vitro and in vivo screening. As the wealth of bioactivity data accumulates, it provides opportunities for the development of deep learning (DL) models with increasing predictive powers. Conventionally, such models were either limited to the use of very simplified representations of proteins or ineffective voxelization of their 3D structures. Herein, we present the development of the PSG-BAR (Protein Structure Graph-Binding Affinity Regression) approach that utilizes 3D structural information of the proteins along with 2D graph representations of ligands. The method also introduces attention scores to selectively weight protein regions that are most important for ligand binding. Results: The developed approach demonstrates the state-of-the-art performance on several binding affinity benchmarking datasets. The attention-based pooling of protein graphs enables identification of surface residues as critical residues for protein–ligand binding. Finally, we validate our model predictions against an experimental assay on a viral main protease (Mpro)—the hallmark target of SARS-CoV-2 coronavirus.

SciScore for 10.1101/2022.04.27.489750: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
First, we retrieved all protein-ligand pairs with associated dissociation constant (Kd) from BindingDB database37.	BindingDB suggested: (BindingDB, RRID:SCR_000390)
The bioassay record (AID 1706)42 by Scripps Research Institute provides PubChem Activity Score normalized to 100% observed primary inhibition.	PubChem suggested: (PubChem, RRID:SCR_004284)

Results from OddPub: Thank you for sharing your code.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

PSG-BAR alleviates these limitations by using entire protein structural graph and …

SciScore for 10.1101/2022.04.27.489750: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
First, we retrieved all protein-ligand pairs with associated dissociation constant (Kd) from BindingDB database37.	BindingDB suggested: (BindingDB, RRID:SCR_000390)
The bioassay record (AID 1706)42 by Scripps Research Institute provides PubChem Activity Score normalized to 100% observed primary inhibition.	PubChem suggested: (PubChem, RRID:SCR_004284)

Results from OddPub: Thank you for sharing your code.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

PSG-BAR alleviates these limitations by using entire protein structural graph and learning attention scores to selectively weight useful regions of the protein based on its interaction with the drug molecule. As a result, our method outperforms state-of-the-art affinity prediction methods on several benchmarking datasets. As such, the integration of protein structures helps to achieve better predictive results. This is mainly because 3D structures contain relevant information on actual configuration of the binding pockets, which have immediate implications for the ligand binding. These methods are bottlenecked by the availability of experimentally derived protein structures; however, with the advancement of NMR Xray crystallography and cryo-EM techniques, more high resolution PDBs are being deposited than ever before. Furthermore, as a result of Alphafold, even more predicted protein structures became available. These developments enable effective advancement of deep learning based approaches; in this work we validate this hypothesis by predicting experimentally determined measures of binding affinity on several protein targets across standard benchmark datasets. Particularly for the KIBA dataset, we show that the augmentation with Alphafold structures improves MSE by 11.1%. It should also be emphasized that augmentation of 3D protein structure information with 2D sequence descriptors can further improve model performance. Since protein sequences capture some level of structu...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Ligand Binding Prediction Using Protein Structure Graphs and Residual Graph Attention Networks

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Enhancing molecular property prediction via transformer with dual graph representation

Nuclear-Charge-Guided Mamba with KAN Dynamic Mixture for Molecular Property Prediction

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Enhancing molecular property prediction via transformer with dual graph representation

Nuclear-Charge-Guided Mamba with KAN Dynamic Mixture for Molecular Property Prediction

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction