A pan-cancer regulatory atlas of 6,983 GWAS variants prioritizes recurrent regulatory annotations and candidate programs at cancer risk loci
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Genome-wide association studies have identified thousands of cancer risk variants in non-coding regions, yet their regulatory mechanisms remain largely uncharacterized. Here we present a regulatory annotation atlas of 6,983 genome-wide significant variants across 23 cancer types, scored using multimodal AlphaGenome predictions and integrated with ENCODE-4, Roadmap Epigenomics, and JASPAR 2024 annotations. Most variants (70.5%) fall outside annotated cis-regulatory elements; 27.7% overlap enhancers and 1.4% overlap promoters. Comparison with 6,626 position-matched eQTL control variants suggests that enhancer-classified variants carry 1.86-fold higher predicted effects (P = 10 − ⁹⁴) and promoter variants 7.84-fold (P = 2.5 × 10 − ¹⁹). A composite prioritization score (RegVar-basic, excluding GWAS-derived pleiotropy and TF disruption, AUC = 0.650; RegVar-full, AUC = 0.675) outperforms CADD (0.499) and LINSIGHT (0.558) in this cancer-gene discrimination benchmark. Within-locus ranking across 2,626 GTEx DAP-G eQTL credible sets shows that RegVar identifies the highest-posterior-probability variant in 47.3% of loci (P = 7.0 × 10 − ¹³), while CADD performs at chance. Predicted target genes show 67.7% concordance with GTEx eQTL assignments. Permutation-controlled motif analysis highlights NFKB1/NF-κB, STAT1, IRF1, and ARNT as exploratory permutation-enriched candidate transcription factors at cancer risk loci. This atlas provides a resource for interpreting non-coding cancer susceptibility variants. Because AlphaGenome uses expression-related training data, GTEx-based validations should be interpreted as partially orthogonal rather than fully independent.