Variance-based Decomposition of Inter-patient Transcriptomic Heterogeneity Reveals Recurrent Modes of Therapeutic Antigen Biology Across 33 Cancer Types
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Inter-patient expression heterogeneity is not merely statistical dispersion but a structured biological signal encoding distinct therapeutic antigen states. Current antigen discovery frameworks over-prioritize mean expression and systematically underinterpret this variance structure, potentially obscuring the most clinically actionable targets. We asked whether inter-patient transcriptomic heterogeneity could be operationalized as a decomposition framework to reveal recurrent, biologically interpretable modes of therapeutic antigen biology. Methods We developed TANK (Tumor Antigen prioritization by variance-based raNKing) as a heterogeneity decomposition framework - not primarily a ranking method, but an approach to resolving recurrent antigen modes from patient-level transcriptomic distributions. TANK was applied across 33 TCGA cancer types (n > 11,000 patients, 60,656 genes). A 10-gene reference panel was predefined based on independent clinical development status. Non-randomness was confirmed against 1,000 random gene set controls (empirical p < 0.0001). External validation was performed in two independent gastric cancer cohorts (GEO GSE26942, n = 217; ACRG GSE66229, n = 400). Single-cell validation was conducted across three cancer types totaling 28,617 annotated tumor epithelial cells. Mode 3 candidates were characterized by survival analysis, immune correlation, and DepMap CRISPR dependency. Beyond the reference panel, CLDN6 was examined as a showcase candidate across the same dimensions. Results TANK resolved four recurrent modes of therapeutic antigen heterogeneity - tumor-restricted rare activation (Mode 1: PRAME), lineage-dependent expression (Mode 2: CLDN18), tumor-enriched heterogeneous expression with functional stratification (Mode 3: MSLN/OLFM4/VSIG1/MUC16), and mean-dominant baseline (Mode 4: ERBB2/EGFR) - each with distinct implications for patient stratification, immune context, and translational modality. Concordance between TANK and MAD confirms the signal reflects a robust variance-associated structure rather than any single metric artifact. Beyond the reference panel, CLDN6 ranked top 0.22% (comparable to FDA-approved CLDN18 at top 0.16%), with significant adverse survival association (p = 0.0049), immune-cold correlates (CD274 r=-0.204, p < 0.0001), and directional CRISPR dependency across gastric (mean=-0.227), colorectal (mean=-0.242), and lung cancer cell lines (mean=-0.288). Conclusions Inter-patient transcriptomic variance defines a previously underutilized axis of antigen biology encoding recurrent therapeutic modes that are systematically inaccessible to mean-based approaches — CV-based ranking recovers 0/9 reference targets and DESeq2 fails to prioritize CLDN18 and CLDN6 within the top 20%, while variance decomposition places both in the top 0.22%. The four-mode framework enables prospective mapping of novel candidates to distinct therapeutic strategies from heterogeneity structure alone, without prior biological knowledge of the candidate. CLDN6 exemplifies this capacity: nominated solely from variance structure, its convergent multi-dimensional evidence positions it as a high-priority oncofetal antigen candidate for patients with limited therapeutic options.