Path-Probability Models Outperform Point-Estimate Scores for Noncoding GWAS Gene Prioritization
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Genome-wide association studies identify thousands of disease-associated loci, but translating these to causal genes remains challenging because existing methods collapse complex regulatory pathways into single point-estimate scores. Here we introduce \textbf{mechanism graphs}---a new probabilistic inference object that explicitly represents causal chains from variants through regulatory elements to genes, tissues, and traits, while propagating calibrated uncertainty at each step. We combine Sum of Single Effects (SuSiE) fine-mapping with multi-causal colocalization (coloc.susie) and ensemble enhancer--gene linking using Activity-by-Contact (ABC) and promoter capture Hi-C (PCHi-C). On anti-leak holdout benchmarks, path-probability models achieve 76\% recall at rank 20 [95\% CI: 71--81\%] versus 58\% [52--64\%] for Open Targets Genetics locus-to-gene scores. All modules maintain Expected Calibration Error below 0.05 on held-out benchmarks, enabling principled decision-making under our evaluation protocol. Colocalization signals replicate across independent eQTL studies ($r = 0.89$ effect size correlation). We demonstrate generalizability beyond cardiometabolic traits to neurological, immune, and cancer phenotypes, though calibration may shift with ancestry or tissue coverage changes.