Variance-based Decomposition of Inter-patient Transcriptomic Heterogeneity Reveals Recurrent Modes of Therapeutic Antigen Biology Across 33 Cancer Types

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Inter-patient expression heterogeneity is not merely statistical dispersion but a structured biological signal encoding distinct therapeutic antigen states. Current antigen discovery frameworks over-prioritize mean expression and systematically underinterpret this variance structure, potentially obscuring the most clinically actionable targets. We asked whether inter-patient transcriptomic heterogeneity could be operationalized as a decomposition framework to reveal recurrent, biologically interpretable modes of therapeutic antigen biology. Methods We developed TANK (Tumor Antigen prioritization by variance-based raNKing) as a heterogeneity decomposition framework - not primarily a ranking method, but an approach to resolving recurrent antigen modes from patient-level transcriptomic distributions. TANK was applied across 33 TCGA cancer types (n > 11,000 patients, 60,656 genes). A 10-gene reference panel was predefined based on independent clinical development status. Non-randomness was confirmed against 1,000 random gene set controls (empirical p < 0.0001). External validation was performed in two independent gastric cancer cohorts (GEO GSE26942, n = 217; ACRG GSE66229, n = 400). Single-cell validation was conducted across three cancer types totaling 28,617 annotated tumor epithelial cells. Mode 3 candidates were characterized by survival analysis, immune correlation, and DepMap CRISPR dependency. Beyond the reference panel, CLDN6 was examined as a showcase candidate across the same dimensions. Results TANK resolved four recurrent modes of therapeutic antigen heterogeneity - tumor-restricted rare activation (Mode 1: PRAME), lineage-dependent expression (Mode 2: CLDN18), tumor-enriched heterogeneous expression with functional stratification (Mode 3: MSLN/OLFM4/VSIG1/MUC16), and mean-dominant baseline (Mode 4: ERBB2/EGFR) - each with distinct implications for patient stratification, immune context, and translational modality. Concordance between TANK and MAD confirms the signal reflects a robust variance-associated structure rather than any single metric artifact. Beyond the reference panel, CLDN6 ranked top 0.22% (comparable to FDA-approved CLDN18 at top 0.16%), with significant adverse survival association (p = 0.0049), immune-cold correlates (CD274 r=-0.204, p < 0.0001), and directional CRISPR dependency across gastric (mean=-0.227), colorectal (mean=-0.242), and lung cancer cell lines (mean=-0.288). Conclusions Inter-patient transcriptomic variance defines a previously underutilized axis of antigen biology encoding recurrent therapeutic modes that are systematically inaccessible to mean-based approaches — CV-based ranking recovers 0/9 reference targets and DESeq2 fails to prioritize CLDN18 and CLDN6 within the top 20%, while variance decomposition places both in the top 0.22%. The four-mode framework enables prospective mapping of novel candidates to distinct therapeutic strategies from heterogeneity structure alone, without prior biological knowledge of the candidate. CLDN6 exemplifies this capacity: nominated solely from variance structure, its convergent multi-dimensional evidence positions it as a high-priority oncofetal antigen candidate for patients with limited therapeutic options.

Article activity feed