Two Previously Unreported Prostate Cancer Gene Candidates Identified Through Governed Multi-Omics Screening of TCGA-PRAD

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Most published TCGA-PRAD analyses report a short list of differentially expressed genes selected by a single statistical metric. Here we describe a governed multi-omics screening pipeline that filters all 19,010 genes in the TCGA-PRAD cohort through four independent and non-compensating quality gates: statistical significance, cross-validation reproducibility, bootstrap sign consistency, and data completeness. Of 19,010 genes, 942 (5.0%) passed all four gates. Ten of these matched established prostate cancer genes in COSMIC and the published literature, confirming the pipeline's sensitivity. Cross-referencing the remaining 932 against PubMed identified two genes with no prior prostate cancer association: DNAH5 (Dynein Axonemal Heavy Chain 5), an axonemal dynein motor protein with 35 publications in other cancer contexts but none in prostate, showing 4.4-fold overexpression (adj. p = 1.22x10^-17); and PRR36 (Proline Rich 36), a gene with no publications in any cancer context, showing 3.7-fold overexpression (adj. p = 2.57x10^-21). Both top candidates replicate with the same direction and significance in the independent MSKCC cohort (GSE21034). A targeted PubMed audit of seven canonical axonemal dynein and ciliary motor genes (DNAI1, DNAI2, DNALI1, NME8, DNAL1, CCDC114, RSPH4A) returned zero prostate cancer publications for any of them, indicating that this protein family is essentially unstudied in the prostate context. Notably, DNAH5 is overexpressed while all 10 of its high-confidence STRING partners (scores 0.935-0.997), all axonemal/ciliary components, are absent from our 942-gene candidate set -- a pattern inconsistent with coordinated dysregulation of the canonical ciliary program. These results demonstrate that multi-evidence gating with strict non-compensation can identify reproducible candidates overlooked by conventional single-metric screens, and surface biological patterns -- such as isolated overexpression decoupled from a gene's known interactome -- that warrant mechanistic follow-up.

Article activity feed