Integrative machine learning predicts activating kinase mutations for precision oncology
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Kinases are enzymes that catalyze phosphorylation and play crucial roles in a myriad of cellular regulatory processes and hemostasis. Patient-specific genetic mutations that aberrantly activate kinases can profoundly influence cancer progression and alter drug efficacy. Predicting the impact of such missense mutations across the human kinome on protein function and cellular signaling is therefore a critical step toward personalized targeted therapy. Here, we present Kinome-AI, an integrative machine learning framework that classifies kinase missense mutations as activating or non-activating. Kinome-AI is trained on a rich multi-modal feature set, including residue-level biochemical changes, sequence embeddings from a protein language model, and structural descriptors of kinase–ATP–substrate complexes derived from molecular modeling. Notably, detailed structural features were available for only 21% of mutants; we leverage these as privileged information during training to impute missing structural data for the remaining ∼79. This strategy boosts performance without requiring structural inputs for new (unseen) mutations. The resulting classifier achieves an area under the receiver operating characteristic curve (AUROC) of 0.85 and a balanced accuracy (BACC) of 0.76 across 1,003 mutations spanning 110 different kinases —substantially outperforming existing bioinformatics and general-purpose variant effect predictors. This work provides a robust approach to quantify sequence–structure– function relationships of cancer-driving kinase mutations, paving the way for improved personalized cancer treatment.
Significance Statement
In cancer patients, numerous mutations in diverse protein kinases lead to marked differences in disease progression and drug response. Identifying which kinase mutations are activating in individual patients is therefore critical for precision oncology. Drawing inspiration from teacher– student (privileged information) learning, we developed a deep learning framework that integrates structural features from molecular simulations with sequence embeddings from protein language models. This approach enables accurate binary classification of the activation status of kinase mutations. Our study demonstrates how data-driven algorithms can leverage accumulated sequence and structural knowledge of known mutations to predict the effects of novel variants a priori . The model, termed Kinome-AI, shows significant promise for incorporation into personalized cancer therapy decision pipelines.