Variance-based Prioritization Reveals a Clinically Validated Antigen Discovery Space Systematically Inaccessible to Mean-Based Methods
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background. Mean-based transcriptomic prioritization (differential expression analysis, DEG) dominates cancer target discovery but is optimized for driver gene identification rather than therapeutic antigen discovery. Whether variance-based prioritization captures a complementary and clinically relevant discovery space has not been systematically evaluated. Methods. We applied variance-based prioritization (TANK) and four comparator methods (DEG, MAD, coefficient of variation, mean expression) to genome-wide transcriptomic data from TCGA gastric adenocarcinoma (n = 443 tumor samples). We evaluated recall of a gold standard set of 28 clinically validated therapeutic antigens (FDA-approved and Phase 2 + ADC/CAR-T/TCR-T targets) at three ranking thresholds. Mechanistic specificity was assessed by comparing surface protein enrichment, driver oncogene enrichment, and therapeutic antigen recall between high-variance and low-variance gene sets. Results. Across all thresholds, TANK substantially outperformed all comparator methods in therapeutic antigen recall (top 5%: TANK 25%, DEG 3.6%, MAD 3.6%, CV 0%, Mean 3.6%). In the primary analysis using the full gene universe (60,654 genes), TANK recovered 50% of gold standard targets at top 5% versus 3.6% for DEG (OR = 27.0, p = 0.000071). High-variance genes were not globally enriched for surface proteins (9.1% vs 9.2%, OR = 0.98, p = 0.58), ruling out surface protein abundance as an explanatory factor. Instead, high variance specifically depleted canonical driver oncogenes (OR = 0.48, p = 0.0004) while achieving extreme enrichment of therapeutic antigens over low-variance genes (OR = infinity, p = 0.002). Three targets nominated prospectively by TANK prior to literature review subsequently converged on FDA-approved or Phase 2 + clinical programs. Conclusions. Transcriptomic variance encodes a specific biological signal for therapeutic antigenicity that is orthogonal to driver gene biology and systematically inaccessible to all mean-based approaches tested. Integration of variance-based prioritization into target discovery workflows may substantially expand the accessible space for immunotherapy antigen development.