Training Strategy Optimization to Mitigate Shortcut Learning in Pan-Cancer Drug Response Prediction

Kazuki Shimamoto
Takafumi Ito
Artem Lysenko
Tatsuhiko Tsunoda

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Prediction of in vivo drug response is a central challenge in precision medicine, but the scarcity of labeled clinical data still necessitates the use of large-scale cancer cell line resources for model training. Domain adaptation methods, which aim to transfer knowledge learned from a source domain (cell lines) to a target domain (patients) by aligning feature distributions across domains, are a promising approach to bridge the gap between in vitro models and in vivo patients. However, we observed that these methods can exhibit a significant discrepancy between pan-cancer evaluation metrics and cancer type-specific prediction accuracy. This performance gap warrants a detailed investigation into their underlying predictive characteristics.

Results

We discovered that cancer-type-specific class imbalances in training data can lead domain adaptation models to engage in shortcut learning, where they primarily discriminate between cancer types rather than capturing the actual biological determinants of drug sensitivity. To address this, we propose a strategy of combining two approaches: (1) excluding cancer types causing imbalance from the training data, and (2) adjusting class balance through oversampling and class weighting while retaining cancer types causing the imbalance. Among all configurations tested in conjunction with the CODE-AE (Context-aware Deconfounding AutoEncoder) framework, the combination of moderate oversampling (30% non-responder ratio) with class weighting achieved the best performance, significantly improving prediction accuracy in 5 out of 11 external patient cohorts from TCGA and GEO.

Conclusions

Our findings demonstrate that appropriate class imbalance correction—rather than wholesale exclusion of imbalanced cancer subtypes—enables effective utilization of biologically relevant information shared across cancer types for drug response prediction. This study highlights the critical importance of jointly optimizing training data composition and class balance adjustment strategies in developing robust pan-cancer drug response prediction models for precision medicine applications.

Highlights

Identified a critical discrepancy in current domain adaptation models for drug response prediction: high pan-cancer accuracy often masks poor performance within specific cancer types.
Revealed the root cause as “shortcut learning,” where models tend to distinguish between cancer tissue types (hematological vs. solid) rather than learning individual drug sensitivity.
Discovered severe class imbalance in training data, with hematological cell lines being disproportionately drug-responsive across multiple chemotherapeutics.
Proposed an architecture-agnostic fix using the CODE-AE framework: moderate oversampling (30% minority ratio) combined with class weighting.
Demonstrated significant improvements in 5 of 11 external patient cohorts, showing that correcting class bias is more effective than simply excluding problematic data.

Version published to 10.64898/2026.05.23.725295 on bioRxiv
May 27, 2026

Domain-adversarial learning predicts clinically actionable drug combination synergy in leukemia patients using bulk transcriptomics data

This article has 11 authors:
1. Jie Zhu
2. Weikaixin Kong
3. Thi Huong Lan Do
4. Sandra Kummer
5. Jarno Kivioja
6. Rafael Romero-Becerra
7. Juho Rousu
8. Mitro Miihkinen
9. Jeffrey W Tyner
10. Thorsten Zenz
11. Tero Aittokallio
This article has no evaluationsLatest version May 20, 2026
Predicting Pre-treatment Resistance or Post-treatment Effect? A Systematic Benchmarking of Single-Cell Drug Response Models

This article has 11 authors:
1. Li Shen
2. Xinliang Sun
3. Shuyu Zheng
4. Ali Hashmi
5. Johanna Eriksson
6. Harri Mustonen
7. Hanna Seppänen
8. Bairong Shen
9. Min Li
10. Markus Vähä-Koskela
11. Jing Tang
This article has no evaluationsLatest version Apr 14, 2026
Transfer Learning Enables Drug–Target Interaction Prediction in Data-Scarce One-Carbon Metabolism

This article has 11 authors:
1. Alperen Dalkiran
2. Takugo Cho
3. M. Volkan Atalay
4. Kun Woo D. Shin
5. Angelo Y. Meliton
6. Yufeng Tian
7. Parker S. Woods
8. Obada R. Shamaa
9. Robert B. Hamanaka
10. Gökhan M. Mutlu
11. Rengul Cetin-Atalay
This article has no evaluationsLatest version May 5, 2026

Discuss this preprint

Listed in

Abstract

Background

Results

Conclusions

Highlights

Article activity feed

Related articles

Domain-adversarial learning predicts clinically actionable drug combination synergy in leukemia patients using bulk transcriptomics data

Predicting Pre-treatment Resistance or Post-treatment Effect? A Systematic Benchmarking of Single-Cell Drug Response Models

Transfer Learning Enables Drug–Target Interaction Prediction in Data-Scarce One-Carbon Metabolism