PROtein Analytics for Kinase Therapeutic Inhibitor Variants (PROAKTIV): A Machine learning approach to predict efficacy of tyrosine kinase inhibitors against mutations in EGFR, ALK and BRAF

Harold Mateo Mojica Urrego
Matthew Groves
Anthonie Van der Wekken
Juvenal Yosa Reyes

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Introduction: The efficacy of targeted therapies in non-small cell lung cancer (NSCLC) is challenged by acquired resistance, driven by on-target mutations in kinases such as EGFR, ALK, and BRAF. Thus, predicting the functional impact of these mutations on tyrosine kinase inhibitor (TKI) sensitivity is critical for personalized treatment. This study presents a foundational machine learning framework for predicting ligand bioactivity against mutated kinases from sequence data, using EGFR, ALK, and BRAF as proof-of-concept models. The results demonstrate the feasibility of developing a generalized predictive model applicable across the kinase family. Methods An automated Python pipeline was developed to curate mutation and bioactivity data from a comprehensive dataset of 25,412 published in vitro pIC50 values for EGFR, ALK, and BRAF variants. Twelve deep learning architectures were trained and evaluated with different encoders for proteins and ligands. Model performance was assessed using Mean Squared Error (MSE), R-squared (R²), and Pearson Correlation Coefficient (PCC), with uncertainty quantified via Monte Carlo dropout. Results The best-performing model demonstrated robust predictive accuracy, providing a pearson correlation of 85% on the mutation/TKI pairs. Model predictions for clinically relevant drug-mutation pairs consistently aligned with established clinical outcomes, including EGFR T790M-mediated resistance to first-generation inhibitors, ALK G1202R resistance to crizotinib, and BRAF V600E sensitivity to selective inhibitors. A protein sequence language model (ESM2) offered improved predictions for complex, rare variants. Conclusions This study introduces a foundational machine learning framework for predicting the impact of kinase mutations on in vitro drug sensitivity. This methodology supports personalized therapy development in NSCLC and may enhance the efficiency of drug discovery pipelines.

Version published to 10.21203/rs.3.rs-8730878/v1 on Research Square
Feb 6, 2026

Integrating Machine Learning-Based Molecular Design with Experimental Validation for the Discovery of EGFR Inhibitors in Lung Cancer

This article has 7 authors:
1. Hailing Qie
2. Liyuan Wang
3. Ce Li
4. Chen Wu
5. Yong Wang
6. Kuo Xiao
7. Lili Diao
This article has no evaluationsLatest version Mar 20, 2026
TP53 mutation is associated with resistance to CDK4/6 Inhibitors, but not to Immune Checkpoint Inhibitors, in ER-positive/HER2-negative Breast Cancer: Real-World perspective

This article has 9 authors:
1. Masanori Oshi
2. Makoto Sugimori
3. Kei Kawashima
4. Shipra Gandhi
5. Akimitsu Yamada
6. Kazutaka Narui
7. Takashi Ishikawa
8. Itaru Endo
9. Kazuaki Takabe
This article has no evaluationsLatest version Feb 23, 2026
Identification of Gallagyldilacton as a Mutation-Biased EGFR Inhibitor from Punica granatum Peel via Integrative Computational Analysis

This article has 2 authors:
1. Doğacan Jeremy Edens
2. Özgür Cem Erkin
This article has no evaluationsLatest version Mar 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Integrating Machine Learning-Based Molecular Design with Experimental Validation for the Discovery of EGFR Inhibitors in Lung Cancer

TP53 mutation is associated with resistance to CDK4/6 Inhibitors, but not to Immune Checkpoint Inhibitors, in ER-positive/HER2-negative Breast Cancer: Real-World perspective

Identification of Gallagyldilacton as a Mutation-Biased EGFR Inhibitor from Punica granatum Peel via Integrative Computational Analysis