Interventionally-guided representation learning for robust and interpretable AI models in cancer medicine

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Machine learning models hold promise in cancer medicine but often lack robustness and interpretability. We introduce a new class of model for high-dimensional molecular data that incorporate interventional auxiliary information to learn latent representations that are informative and interpretable by design. By using causal signals from genetic loss-of-function screens, our approach generates representations that generalize well across data distributions and biological contexts. In cancer cell line datasets, we show that causal guidance enables “zero-shot” transfer to cancer types unseen during training. Moreover, models trained solely on cell line data translate effectively to clinical cohorts, demonstrating strong “bench-to-bedside” generalization without fine-tuning. This strategy highlights a scalable way to leverage tractable laboratory assays for clinical modeling. More broadly, our results establish how integrating causal biological information within generative frame-works enhances data efficiency, interpretability, and robustness, opening avenues for a new generation of scientifically informed AI models in molecular medicine.

Article activity feed