scTransMIL Bridges Patient-Level Phenotypes and Single-Cell Transcriptomics for Cancer Screening and Heterogeneity Inference
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Single-cell sequencing technology has revolutionized cancer research by revealing unprecedented insights into tumor heterogeneity. However, reliably connecting patient-level cancer phenotypes with single-cell transcriptomic profiles remains challenging, due to technical constraints and labeling ambiguity. This further hinders the precise cancer screening and intensive study of tumor mechanism based on single-cell sequencing. To bridge this gap, we introduce scTransMIL, a sc RNA-seq Trans former-based M ulti- I nstance L earning framework that learns whole-genome context to deliver comprehensive cancer insights at the sample, cell, and gene levels across three biological scales: (1) accurate patient-level cancer phenotype prediction, (2) precise single-cell disease scoring (validated on 4 million single cells), and (3) genome-wide biomarker discovery. Benchmark experiments demonstrate scTransMIL’s exceptional performance, including robust out-of-distribution generalization and clinically relevant prediction of metastatic tissue-of-origin, a crucial capability for identifying cancers of unknow primary. At single-cell resolution, scTransMIL identified both known and novel biomarkers for tumor B cells that conventional differential expression analysis failed to detect while maintaining consistent concordance with malignant cell annotations. scTransMIL’s adaptability is exemplified in acute myelocytic leukemia, where minimal fine-tuning with only a few patient sample labels enabled: (i) discovery of novel disease subtypes with distinct clinical outcomes, (ii) reconstruction of differentiation trajectories at the single-cell resolution, and (iii) identification of subtype-specific gene signatures. In summary, by systematically linking cellular and molecular profiles with clinical disease phenotypes, scTransMIL emerges as a transformative tool poised to advance both basic cancer research and precision oncology applications.