PROtein Analytics for Kinase Therapeutic Inhibitor Variants (PROAKTIV): A Machine learning approach to predict efficacy of tyrosine kinase inhibitors against mutations in EGFR, ALK and BRAF
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Introduction: The efficacy of targeted therapies in non-small cell lung cancer (NSCLC) is challenged by acquired resistance, driven by on-target mutations in kinases such as EGFR, ALK, and BRAF. Thus, predicting the functional impact of these mutations on tyrosine kinase inhibitor (TKI) sensitivity is critical for personalized treatment. This study presents a foundational machine learning framework for predicting ligand bioactivity against mutated kinases from sequence data, using EGFR, ALK, and BRAF as proof-of-concept models. The results demonstrate the feasibility of developing a generalized predictive model applicable across the kinase family. Methods An automated Python pipeline was developed to curate mutation and bioactivity data from a comprehensive dataset of 25,412 published in vitro pIC50 values for EGFR, ALK, and BRAF variants. Twelve deep learning architectures were trained and evaluated with different encoders for proteins and ligands. Model performance was assessed using Mean Squared Error (MSE), R-squared (R²), and Pearson Correlation Coefficient (PCC), with uncertainty quantified via Monte Carlo dropout. Results The best-performing model demonstrated robust predictive accuracy, providing a pearson correlation of 85% on the mutation/TKI pairs. Model predictions for clinically relevant drug-mutation pairs consistently aligned with established clinical outcomes, including EGFR T790M-mediated resistance to first-generation inhibitors, ALK G1202R resistance to crizotinib, and BRAF V600E sensitivity to selective inhibitors. A protein sequence language model (ESM2) offered improved predictions for complex, rare variants. Conclusions This study introduces a foundational machine learning framework for predicting the impact of kinase mutations on in vitro drug sensitivity. This methodology supports personalized therapy development in NSCLC and may enhance the efficiency of drug discovery pipelines.