Integrating Drug-like Moieties and Binding Site Evolution for Kinase Inhibitor Prediction Using Ensemble Learning Models

Wei-lin Lin
Yen-Chao Hsu
Jinn-Moon Yang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Protein kinases play a pivotal role in regulating cellular signaling pathways, and their dysregulation is closely associated with numerous diseases, including cancer, autoimmune disorders, and inflammation. Although over 100,000 kinase inhibitors have been developed, only a small fraction has achieved FDA approval, primarily due to off-target effects stemming from the high conservation of kinase binding sites. To address this challenge, we present an ensemble learning framework that integrates both chemical and protein-level information to improve the prediction of selective kinase inhibitors. On the compound side, we construct a 1,048-dimensional feature representation encompassing topological fingerprints, drug-like moieties, atomic composition, and stereochemical descriptors. On the protein side, we develop a 1,700- dimensional representation of kinase binding site environments using multiple sequence alignment and evolutionary conservation information. Comprehensive evaluations across 131 human kinases show that the integration of these features significantly improves model performance, achieving 93.6% accuracy on an independent test set. Furthermore, SHAP-based model interpretation reveals that high-impact features correspond to known binding motifs, such as the P-loop, Hinge region, and DFG motif, as confirmed by crystal structure validation. Lastly, we apply the model to a curated dataset of flavonoid-like compounds, identifying potential natural product-derived kinase inhibitors. This study demonstrates that the proposed integrative approach not only enhances predictive accuracy but also provides interpretable insights into kinase-ligand interactions, offering a promising direction for rational kinase inhibitor design.

Version published to 10.1101/2025.05.28.656738 on bioRxiv
Jun 1, 2025

Multi-Modal Ensemble Learning for TLR4 Binding Prediction: Addressing Data Scarcity and Leakage in Small Molecule Drug Discovery

This article has 3 authors:
1. Brandon Yee
2. Maximilian Rutkowski
3. Wilson Collins
This article has no evaluationsLatest version Jan 28, 2026
Integrating Computational Biology in Modern Drug Discovery: A Synergistic Approach of Structure-Based, Ligand-Based, and Network Pharmacology Strategies

This article has 4 authors:
1. Cromwel Tepap Zemnou
2. Gabriel Tchuente Kamsu
3. Ramelle Ngakam
4. Etienne Junior Tcheumeni
This article has no evaluationsLatest version Jan 29, 2026
AI–Driven Design of Miniproteins as Potential Allosteric Modulators

This article has 4 authors:
1. Xin Liu
2. Yunxiang Sun
3. Huaqiong Li
4. Zhiqiang Yan
This article has no evaluationsLatest version Jan 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Multi-Modal Ensemble Learning for TLR4 Binding Prediction: Addressing Data Scarcity and Leakage in Small Molecule Drug Discovery

Integrating Computational Biology in Modern Drug Discovery: A Synergistic Approach of Structure-Based, Ligand-Based, and Network Pharmacology Strategies

AI–Driven Design of Miniproteins as Potential Allosteric Modulators