Resolving malignant cell heterogeneity from bulk tumor RNA-seq data with CDState

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Intratumor transcriptional heterogeneity (ITTH), defined by the coexistence of diverse cell states within one tumor, complicates cancer treatment by contributing to variable therapeutic responses. Although single-cell RNA sequencing can resolve this complexity, its cost and technical demands limit its large-scale use. Bulk RNA-seq data provide a scalable alternative, but most deconvolution methods depend on predefined references, restricting their ability to detect novel malignant states. Unsupervised approaches avoid these constraints but are not tailored to capture heterogeneity within the malignant compartment. To address these limitations, we introduce CDState, an unsupervised method for inferring malignant cell subpopulations from bulk RNA-seq data. CDState utilizes non-negative matrix factorization improved with sum-to-one constraint and a cosine similarity-based optimization to deconvolve bulk gene expression into distinct cell state profiles. We demonstrate robustness of CDState on bulkified single-cell RNA-seq datasets from five cancer types, showing that it outperforms existing unsupervised deconvolution methods in the estimation of both cell state proportions and gene expression profiles. Applied to 33 cancer types from The Cancer Genome Atlas, CDState reveals recurrent gene programs, including epithelial-mesenchymal transition, MYC targets, and oxidative phosphorylation, as major contributors to malignant cell ITTH. We further link malignant states to patient clinical features, identifying states associated with poor prognosis. We propose an intratumor heterogeneity index and show its association with patient survival, clinical characteristics, and therapeutic response. Finally, we identify mutations and copy number alterations in genes such as TP53 , KRAS , PIK3CA, SOX2, and SATB1 as potential genetic drivers of malignant cell ITTH across cancer types.

Article activity feed