Evaluating deconvolution methods using real bulk RNA-expression data for robust prognostic insights across cancer types
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Deconvolution of bulk RNA-expression data unlocks the cellular complexity of cancer, yet traditional pseudobulk benchmarks may not always be reliable in real-world settings where absolute cell proportions are unknown.
Results
Here, we introduce a novel real-data framework, leveraging 18 real bulk RNA-expression cohorts (5,891 samples) across nine cancer types to evaluate five deconvolution methods based on differentially proportioned (DP) and prognosis-related (PR) cell types. Across three innovative benchmark scenarios—consistency with scRNA-seq, reproducibility across cohorts, and reproducibility of prognostic relevance—ReCIDE and BayesPrism stand out as two robust deconvolution methods. Application of a pan-cancer analysis based on the deconvolution of TCGA cohorts identifies matrix cancer-associated fibroblasts (mCAF) as a prognostic marker with consistent effects across multiple cancers. Building on this finding, we find a prognostic indicator combining classical monocytes and mCAF cell proportions to be significant in five TCGA cohorts, which we further validate in five independent GEO cohorts.
Conclusions
This study broadens deconvolution benchmarking, offering actionable tools for precision oncology and guiding method selection for translational research.