Integrative machine learning predicts activating kinase mutations for precision oncology

Yiming Wang
Fangping Wan
Zhangtao Chen
Jonathan Nukpezah
Tom Pan
Kathleen J. Stebe
Cesar de la Fuente-Nunez
Ravi Radhakrishnan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Kinases are enzymes that catalyze phosphorylation and play crucial roles in a myriad of cellular regulatory processes and hemostasis. Patient-specific genetic mutations that aberrantly activate kinases can profoundly influence cancer progression and alter drug efficacy. Predicting the impact of such missense mutations across the human kinome on protein function and cellular signaling is therefore a critical step toward personalized targeted therapy. Here, we present Kinome-AI, an integrative machine learning framework that classifies kinase missense mutations as activating or non-activating. Kinome-AI is trained on a rich multi-modal feature set, including residue-level biochemical changes, sequence embeddings from a protein language model, and structural descriptors of kinase–ATP–substrate complexes derived from molecular modeling. Notably, detailed structural features were available for only 21% of mutants; we leverage these as privileged information during training to impute missing structural data for the remaining ∼79. This strategy boosts performance without requiring structural inputs for new (unseen) mutations. The resulting classifier achieves an area under the receiver operating characteristic curve (AUROC) of 0.85 and a balanced accuracy (BACC) of 0.76 across 1,003 mutations spanning 110 different kinases —substantially outperforming existing bioinformatics and general-purpose variant effect predictors. This work provides a robust approach to quantify sequence–structure– function relationships of cancer-driving kinase mutations, paving the way for improved personalized cancer treatment.

Significance Statement

In cancer patients, numerous mutations in diverse protein kinases lead to marked differences in disease progression and drug response. Identifying which kinase mutations are activating in individual patients is therefore critical for precision oncology. Drawing inspiration from teacher– student (privileged information) learning, we developed a deep learning framework that integrates structural features from molecular simulations with sequence embeddings from protein language models. This approach enables accurate binary classification of the activation status of kinase mutations. Our study demonstrates how data-driven algorithms can leverage accumulated sequence and structural knowledge of known mutations to predict the effects of novel variants a priori . The model, termed Kinome-AI, shows significant promise for incorporation into personalized cancer therapy decision pipelines.

Version published to 10.1101/2025.10.14.682355 on bioRxiv
Oct 15, 2025

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026
Integrative Transcriptomics and Machine Learning Identify Key Predictive Genes and Pathways in Celiac Disease

This article has 2 authors:
1. Amir Mahdi Taghizadeh
2. Yasin Soflaei
This article has no evaluationsLatest version Jan 7, 2026
Multi-Omic Integration and Machine Learning Reveal Regulatory Networks Driving Breast Cancer Progression

This article has 2 authors:
1. Unmilita Das Moon
2. Kushal Raj Roy
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Significance Statement

Article activity feed

Related articles

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

Integrative Transcriptomics and Machine Learning Identify Key Predictive Genes and Pathways in Celiac Disease

Multi-Omic Integration and Machine Learning Reveal Regulatory Networks Driving Breast Cancer Progression