Regulatory Pathogenicity Is Mechanistically Heterogeneous: A Taxonomy of Activity-, Architecture-, and Coverage-Driven Blind Spots

Sergey V. Boyko

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background. Current variant interpretation tools assign pathogenicity along a single axis — typically sequence conservation or predicted functional impact. This conflation obscures mechanistically distinct classes of regulatory effect that require different computational approaches and different experimental validations. Whether regulatory pathogenicity decomposes into separable mechanistic axes, and how large the resulting blind spots are, has not been systematically assessed. Results. We propose a five-class taxonomy of regulatory pathogenicity: (A) activity-driven, where variants alter enhancer or promoter function detectable by reporter assays; (B) architecture-driven, where variants disrupt 3D chromatin contact topology detectable by structural simulation; (C) mixed, combining both mechanisms; (D) coverage gap, where current tools lack scoring capability; and (E) tissue-mismatch artifact, where apparent signals reflect incorrect tissue context. We classify 21 cases encompassing 30,318 ClinVar variants across 9 clinically important genomic loci using ARCHCODE, a loop-extrusion-based structural pathogenicity engine, integrated with VEP, CADD, MPRA cross-validation, and CRISPRi benchmarking. We show that 25 high-confidence and 29 candidate architecture-driven variants (Class B) are systematically missed by sequence-based tools: cross-locus weighted NMI(ARCHCODE, VEP) = 0.026; NMI at tissue-matched HBB = 0.495 (95% CI: 0.433–0.560). These variants cluster within 434 bp of tissue-matched enhancers (p = 2.51 × 10⁻³¹), 58-fold closer than activity-driven variants (25,138 bp), and return null results in both MPRA and CRISPRi screens — consistent with a contact-disruption rather than element-activity mechanism. An additional 207 coverage-gap variants (Class D) are unscored by VEP but detectable by structural simulation. Together, architecture-driven and coverage-gap variants account for 261 structural blind spots, of which 79.3% reflect tool absence (Class D) and 20.7% reflect true mechanistic orthogonality (Class B). Tissue-mismatch analysis (EXP-003) demonstrates that architecture-driven signal collapses by 700-fold in mismatched tissue (matched delta = 0.00357 vs. mismatch delta = 5.04 × 10⁻⁶), establishing tissue context as a necessary condition for Class B detection. A seven-locus tissue-match panel using ENCODE ChIP-seq data reveals four distinct outcome modes: positive amplification (SCN5A 1.37×, LDLR 1.43×), tail amplification (MLH1 2.0×), null (BRCA1 0.99×), and reverse effect (CFTR 0.60×, TERT 0.39×, TP53 0.18×), with reverse cases decomposing into overparameterization, enhancer loss, and enhancer dilution sub-mechanisms. Eight canonical cases from the literature — including TAD boundary disruption (Lupiáñez et al. 2015), insulated neighborhood disruption (Hnisz et al. 2016), and enhancer hijacking (Gröschel et al. 2014) — independently validate the taxonomy across limb malformations, leukemia, and medulloblastoma. Conclusions. Single-axis scoring is an inadequate abstraction for regulatory variant interpretation. Mechanistic decomposition reveals that architecture-driven pathogenicity — representing 20.7% of structural blind spots — requires dedicated 3D chromatin modeling that no current sequence-based tool provides. We propose that variant interpretation frameworks should explicitly assign mechanistic class before scoring, enabling targeted experimental validation and reducing systematic blind spots in clinical genetics.

Version published to 10.21203/rs.3.rs-9090074/v1 on Research Square
Mar 12, 2026

InsulatorLeak, a mechanism-first pipeline for variant prioritization by predicted CTCF insulator disruption across seven autoimmune diseases

This article has 1 author:
1. Navya Shah
This article has no evaluationsLatest version Apr 10, 2026
Clinical significance of mRNA nonstop decay in rare disease diagnosis and recommendations for its application in variant classification

This article has 17 authors:
1. Yue Zhou
2. Dandan He
3. Nikita Mehta
4. Christian C. Taborda
5. Robert Rigobello
6. Morgan Driver
7. John Lattier
8. Ning Liu
9. Yue Wang
10. David Wu
11. Lucy A. Godley
12. Liesbeth Vossaert
13. Xiaonan Zhao
14. Linyan Meng
15. Christine M. Eng
16. Fan Xia
17. Xi Luo
This article has no evaluationsLatest version Mar 26, 2026
MobiDeep: an AI-based meta-score for scoring non-coding DNA variations

This article has 18 authors:
1. Abdelhakim Bouazzaoui
2. Jean-Madeleine de Sainte Agathe
3. Simon Cabello-Aguilar
4. Ophélie Evrard
5. Juliette Nectoux
6. Marina Konyukh
7. Leila Qebibo
8. Thibault Coste
9. Sandrine M. Caputo
10. Perrine Brunelle
11. Yohann Jourdy
12. Cécile Rouzier
13. Mireille Cossée
14. Charles Van Goethem
15. Olivier Ardouin
16. Vasiliki Kalatzis
17. Anne-Françoise Roux
18. David Baux
This article has no evaluationsLatest version Mar 11, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

InsulatorLeak, a mechanism-first pipeline for variant prioritization by predicted CTCF insulator disruption across seven autoimmune diseases

Clinical significance of mRNA nonstop decay in rare disease diagnosis and recommendations for its application in variant classification

MobiDeep: an AI-based meta-score for scoring non-coding DNA variations