InsulatorLeak, a mechanism-first pipeline for variant prioritization by predicted CTCF insulator disruption across seven autoimmune diseases

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Genome-wide association studies have mapped hundreds of autoimmune risk loci, yet fewer than 5% of associated variants have established causal mechanisms. CTCF-bound insulators partition the genome into regulatory domains. When disrupted, enhancers can activate genes they normally do not regulate, a phenomenon termed insulator leakage that has been implicated in malignancy and may contribute to autoimmunity. We reasoned that prioritizing variants by predicted insulator disruption, then validating that prioritization empirically across independent cohorts and diseases, would yield candidates suitable for functional follow-up and target selection. InsulatorLeak combines SuSiE fine-mapping, Enformer-predicted CTCF disruption (Scaled Average Difference, SAD), locus-matched null calibration for empirical p-values, chromatin enrichment (ChromHMM, Hi-C), eQTL and pQTL colocalization, and benchmarking against FUMA, Sei, and Boltz-2. We applied the pipeline to 14 non-MHC autoimmune loci across seven diseases (multiple sclerosis, inflammatory bowel disease, rheumatoid arthritis, type 1 diabetes, systemic lupus erythematosus, psoriasis, and atopic dermatitis) using 19 GWAS datasets spanning more than 2.5 million total participants. In MS, TNFRSF1A replicated in 4/6 cohorts (p = 0.022), and IL2RA reached p = 0.006 in IMSGC MS Chip (N = 115,803). Pan-autoimmune replication across seven diseases showed IL7R in 6/7 expansion cohorts (p = 0.011–0.052) and TNFRSF1A in 5/7 (p = 0.011–0.033), indicating insulator disruption as a shared autoimmune mechanism. Thirteen of 14 loci replicated in at least two ancestry groups. IL12AB was significant in all four ancestries, and TYK2 reached FDR q = 0.040 in South Asian. Chromatin enrichment was strongest at drug-target loci (CD58 21×, IL2RA 23×, IL7R 12×, TNFRSF1A 5.3× in MS, CD40 8.6× and IL7R 34× in IBD). pQTL colocalization yielded cognate validation at CD58 (0.996), CD40 (0.958), CD6 (0.79 in IBD), and IL2RA (0.79). Both MS and IBD showed the full validation stack from fine-mapping through pQTL. Overlap with FUMA lead SNPs was 2/120, indicating complementary rather than redundant prioritization. In RA, four loci reached p < 0.05 (CLEC16A, TNFRSF1A, STAT3, BACH2). Extension to T1D, SLE, psoriasis, and AD is reported in Supplementary Table S8. In a disease-specific locus extension (GAIM; General Autoimmune Insulator Mechanism) we applied the identical pipeline (SuSiE, Enformer CTCF SAD, and locus-matched nulls drawn from each indication’s own GWAS) to fourteen top non-MHC loci per disease for five large-summary-statistic cohorts: atopic dermatitis meta-analysis (AD), T1D (Chiou et al. 2021), RA (FinnGen), IBD (de Lange et al. 2017), and SLE (Bentham et al. 2015). This tests insulator disruption at regions prioritized by each disease, not only at the MS-derived 14 locus set. Supplementary Table S15 lists, for every locus–cohort pair, the minimum empirical p-value, Benjamini–Hochberg q-value, variant identifier, and variant count. Representative indication-native signals include IL33 and STAT6 (AD), CCR7 and CTLA4 (T1D), TNFAIP3 (RA and SLE), RGS14 and TNFSF15 (IBD), and ITGAM (SLE). Full data are in Supplementary Table S15. InsulatorLeak produces mechanism-first target validation with empirical prioritization across autoimmune diseases. The pipeline generalizes from MS to six additional diseases. Loci corresponding to approved therapeutics (IL12AB/ustekinumab, TYK2/deucravacitinib, TNFRSF1A/etanercept, IL2RA/daclizumab) show consistent insulator-disruption signal, aligning mechanism-first prioritization with known drug targets. At shared loci, multiple diseases exhibit insulator disruption with disease-specific credible sets, consistent with insulator disruption as a general autoimmune mechanism at shared loci rather than transfer of MS-specific variants. The modular design permits application to other complex traits. Code is available upon request.

Article activity feed