Cross-assay RNA modeling reveals cancer biomarkers
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The clinical heterogeneity of cancer poses a major challenge for precision medicine. Limited cohort sizes across evolving assay platforms impede reliable biomarker discovery. Here, we systematically evaluate how to integrate data from four transcriptomics platforms: bulk and single-cell (sc) RNA sequencing (RNA-seq), NanoString, and microarray for predictive modeling in cancer. We use high-grade serous carcinoma (HGSC) of tube-ovarian origin as a model system, as it is highly heterogeneous in both biology and assay data.
We find that using fold-change of gene expression in patients with matched pre- and post-neoadjuvant chemotherapy samples reduces inter-patient and inter-assay variability but is insufficient to overcome platform-specific biases. Microarray and scRNA-seq data exhibit systematic biases, while RNA-seq and NanoString show the most promise for combination into a single training cohort. To mitigate inter-assay limitations, we generate a new data set of HGSC tumor samples profiled with both RNA-seq and NanoString, and use it to identify the limits of detection and optimal harmonization strategies. Our approaches enable integration of cohorts for separate and combined RNA-seq and NanoString predictive models of disease recurrence (test-set AUROCs > 0.8), validated in external microarray cohorts.
We leverage single-cell and bulk RNA-seq network-based analyses to provide mechanistic context for genes in the predictive models. Our models indicate that GBP4 expression is a key predictor of recurrence and marks immune remodeling towards cytotoxicity. We provide an interactive web portal to facilitate exploration of data and results. These findings guide cross-assay harmonization of transcriptomic data and enable improved predictive modeling in heterogeneous cancers.
Statement of Significance
We present a framework for integrating RNA-seq, NanoString, microarray, and single-cell transcriptomic data for predictive modeling, enabling robust biomarker discovery in heterogeneous cancers and identifying GBP4 as a marker of immune remodeling.