Resolving Discrepancies in Kinase Activity Labels Using Machine Learning

Mohammed EL Moumni

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Protein kinases regulate essential cellular processes by transitioning between active and inactive conformational states, primarily governed by structural rearrangements within the activation segment. Accurately distinguishing these states is critical for kinase research and drug discovery. In this study, we developed a classification framework that leverages 15 geometric descriptors derived from the activation segment to differentiate kinase conformations. We trained multiple machine learning models, including Support Vector Machines (SVM), Logistic Regression, Random Forest, and XGBoost, using only non-conflicting kinase activity labels, identifying over 300 discrepancies between resources. To resolve these inconsistencies, we applied Benchmarking, Randomized Search, Bayesian Optimization, and Coordinate Descent techniques. Additionally, we explored probabilistic models such as Kernel Density Estimation (KDE) and Gaussian Mixture Models (GMM) for kinase classification based on density estimation. Random Forest consistently achieved perfect classification performance, emerging as the most reliable model, with XGBoost as a strong alternative. By successfully distinguishing between active and inactive kinases, our classification scheme provides a robust tool for resolving conflicting labels and has important implications for structure-based drug discovery and guided drug design.

Version published to 10.20944/preprints202506.0609.v1
Jun 9, 2025

Feature-Optimized Machine Learning Benchmarking for Protein Interface Prediction in Permanent Homodimer Complexes with Distinct Structural Features

This article has 4 authors:
1. Tayyip Topuz
2. Zeki Erdem
3. Halil Bisgin
4. E. Demet Akten
This article has no evaluationsLatest version Feb 2, 2026
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025
Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

This article has 13 authors:
1. Peilin Xie
2. Xingchen Liu
3. Lantian Yao
4. Zhihao Zhao
5. Anming Yang
6. Jiahui Guan
7. Zijun Jiao
8. Zhihong Liu
9. Junwen Wang
10. Tzong-Yi Lee
11. Zigang Li
12. Bingyu Cui
13. Ying-Chih Chiang
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Feature-Optimized Machine Learning Benchmarking for Protein Interface Prediction in Permanent Homodimer Complexes with Distinct Structural Features

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model