Monotherapy cancer drug-blind response prediction is limited to intraclass generalization

William G. Herbert
Nicholas Chia
Paul A. Jensen
Marina RS Walther-Antonio

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Monotherapy cancer drug response prediction (DRP) models predict the response of a cell line to a given drug. Analyzing these models’ performance includes assessing their ability to predict the response of cell lines to new drugs, i.e. drugs that are not in the training set. Drug-blind prediction displays greatly diminished performance or outright failure across a wide range of model architectures and different large pharmacogenomic datasets. Drug-blind failure is hypothesized to be caused by the relatively limited set of drugs present in these datasets. The time and cost associated with further cell line experiments is significant, and it is impossible to predict beforehand how much data would be enough to overcome drug-blind failure. We must first define how current data contributes to drug-blind failure before attempting to remedy drug-blind failure with further data collection. In this work, we quantify the extent to which drug-blind generalizability relies on mechanistic overlap of drugs between training and testing splits. We first identify that the majority of mixed set DRP model performance can be attributed to drug overfitting, likely inhibiting generalization and preventing accurate analysis. Then, by specifically probing the drug-blind ability of models, we reveal the sources of generalizable drug features are confined to shared mechanisms of action and related pathways. Furthermore, we observed that, for certain mechanisms, we can significantly improve performance by limiting the training of models to a single mechanism compared to training on all drugs simultaneously. We conclude that drug-blind performance is a poor benchmark for DRP as it does not describe model behavior, it describes dataset behavior. Our investigation displays that deep learning models trained on large, monotherapy cell line panels can more accurately describe mechanism of action of drugs rather than their advertised connection to broader cancer biology.

Author summary

In this paper, we characterize the feature space of cancer drug-blind prediction. To understand the efficacy of a novel cancer drug it has never seen before (drug-blind), a model must be able to accurately compare this drug to all drugs it saw during training. These relationships between cancer drugs, the feature space, must be described well enough that this is possible. We believe that these relationships are poorly defined because cancer DRP models always display reduced performance in a drug-blind context. For the first time, we quantified the limits of generalization in a drug-blind setting. We showed that drug-blind generalization describes mechanistic relationships among drugs during model training. We also outlined new criteria with which to judge the drug-blind ability of a model. Failure of drug-blind prediction is an oft overlooked shortcoming in cancer DRP with potentially damaging downstream implications. We hope to show drug-blind ability of these models in a new light to guide others towards more pertinent tasks in cancer deep learning.

Version published to 10.1101/2025.06.16.659838 on bioRxiv
Jun 20, 2025

Deep Learning Paradigm for Precision Lung Cancer Therapy with AI-Driven Genotype-Phenotype Mining and Patient-Derived Organoid Validation

This article has 19 authors:
1. Zhongze Gu
2. Mingyue Li
3. Xiaoming Shi
4. Tianmu Hu
5. Juan Zhang
6. Ziliang Ye
7. Yuhan Cai
8. Qiwei Li
9. Linchong Liu
10. Wenlong Yu
11. Jiajia Jing
12. Qiuyin Zhang
13. Juanjuan Li
14. Xin Zhou
15. Nan Qiao
16. Jun Bao
17. Zaozao Chen
18. Lili Xu
19. Tao Wang
This article has no evaluationsLatest version Dec 23, 2025
Blind Challenges Let Us See the Path Forward for Predictive Models

This article has 4 authors:
1. John D. Chodera
2. W. Patrick Walters
3. Sriram Kosuri
4. James S. Fraser
This article has no evaluationsLatest version Jan 27, 2026
Blind Challenges Let Us See the Path Forward for Predictive Models

This article has 4 authors:
1. John D. Chodera
2. W. Patrick Walters
3. Sriram Kosuri
4. James S. Fraser
This article has no evaluationsLatest version Jan 27, 2026

Discuss this preprint

Listed in

Abstract

Author summary

Article activity feed

Related articles

Deep Learning Paradigm for Precision Lung Cancer Therapy with AI-Driven Genotype-Phenotype Mining and Patient-Derived Organoid Validation

Blind Challenges Let Us See the Path Forward for Predictive Models

Blind Challenges Let Us See the Path Forward for Predictive Models