Deep Learning enabled discovery of kinase drug targets in Pharos

Ádám M. Halász
Stephen L. Mathias
Srinjoy Das
Jeremy S. Edwards

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We use machine learning with standardized molecular structure and gene ontology data to predict ligand interactions for a set of human kinases. We realize this by leveraging information from the TCRD / Pharos database, developed and maintained within the Illuminating the Druggable Genome (IDG) project.

Pharos collects relevant biochemical and clinically relevant information of a large set of biologically important (human) proteins from publicly available sources, including scientific publications as well as specialized databases. The 635 kinases listed in Pharos are classified into levels reflecting the relative amount and type of accumulated information. Importantly, molecular structure and Gene Ontology annotations are available for the entire set, but only 455 of the kinases have recorded ligand affinity data.

We developed a deep neural network-based framework to predict the ligand affinity profile for kinases using generally available information (molecular structure and Gene Ontology annotations) as input. The input data is organized into a 2,770 – dimensional vector with binary entries. The output data are predicted affinity values for interactions between the respective kinase and possible ligands.

To address the very large number of possible ligands (58,800) and the sparsity of available binding data, we organized the ligands into 5,275 clusters based on structural similarity measures. Our model framework is trained to predict likely interactions between kinases and these ligand clusters.

We aim to identify sets of likely ligand partners associated with high predicted relative affinities for a given kinase. We measure performance by evaluating the efficiency in identifying known ligand partners for documented kinases that were not included in the training data. Our results indicate that our model framework can identify sets of ligands that will contain a significant fraction of the correct (known) ligand partners.

Version published to 10.1101/2024.10.08.612754 on bioRxiv
Oct 11, 2024

Convolutional Deep Learning Approach to identify DNA Sequences for Gene Prediction

This article has 2 authors:
1. Jesus Antonio Motta
2. Pedro David Gomez
This article has no evaluationsLatest version Jan 27, 2026
Multi-Modal Ensemble Learning for TLR4 Binding Prediction: Addressing Data Scarcity and Leakage in Small Molecule Drug Discovery

This article has 3 authors:
1. Brandon Yee
2. Maximilian Rutkowski
3. Wilson Collins
This article has no evaluationsLatest version Jan 28, 2026
Integrating Computational Biology in Modern Drug Discovery: A Synergistic Approach of Structure-Based, Ligand-Based, and Network Pharmacology Strategies

This article has 4 authors:
1. Cromwel Tepap Zemnou
2. Gabriel Tchuente Kamsu
3. Ramelle Ngakam
4. Etienne Junior Tcheumeni
This article has no evaluationsLatest version Jan 29, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Convolutional Deep Learning Approach to identify DNA Sequences for Gene Prediction

Multi-Modal Ensemble Learning for TLR4 Binding Prediction: Addressing Data Scarcity and Leakage in Small Molecule Drug Discovery

Integrating Computational Biology in Modern Drug Discovery: A Synergistic Approach of Structure-Based, Ligand-Based, and Network Pharmacology Strategies