AI-based separation of malignant cell- and microenvironment-specific gene expression from bulk RNA sequencing enhances biomarker interpretation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Bulk RNA sequencing (RNA-seq)-based gene expression analysis is a promising tool for personalized cancer diagnostics, disease monitoring, and treatment decision-making. However, its clinical utility is limited by interference from non-malignant tumor microenvironment cells, which can dominate transcript data in low-purity tumors. While cell deconvolution methods like Kassandra can predict digital cell percentages from bulk RNA-seq, approaches for delineating the gene expression contribution of tumor compartments remain limited. To overcome this limitation, we developed Helenus, a machine-learning-based tool that separates gene expression between malignant and non-malignant cells. Trained on over 200 million synthetic RNA profiles representing diverse tumor types and purities, Helenus demonstrated high accuracy in separating gene expression origin. Helenus also uncovered true genomic-RNA correlations such as copy number alterations and the expression of therapeutic antibody-drug conjugate targets specifically on tumor cells. Helenus provides critical insights into tumor biology and immunotherapy response by precisely identifying biomarker expressions, paving the way for more effective personalized cancer care.

Significance

Helenus extracts gene expression profiles of cancerous and non-cancerous compartments of tumor biopsies from bulk RNA-seq data, enabling the determination of how the expression of specific genes affects malignancy and tumor immunity.

Article activity feed