A machine-learning framework to characterize functional disease architectures and prioritize disease variants
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Modeling disease effect sizes from genome-wide association studies (GWAS) is critical for both advancing our understanding of the functional architecture of human disease and providing informative priors that enhance the prioritization of potentially causal variants. Here, we introduce the variant-to-disease (V2D) framework, an approach that leverages machine-learning algorithms to model disease effect sizes from posterior estimates of effects obtained via genome-wide fine-mapping and functional annotations. We benchmarked the V2D framework using simulations and real data analysis, demonstrating that it provides reliable estimates of heritability ( h 2 ) functional enrichment. By applying the V2D framework with linear trees to 15 UK Biobank traits, we identified non-linear relationships between constraint and regulatory annotations, highlighting constrained regulatory variants as the main functional component of disease functional architecture ( h 2 enrichment = 17.3 ± 1.0x across 79 independent GWAS). By applying the V2D framework with neural networks, we developed GWAS prioritization scores, which were extremely enriched in common variant h 2 (20.6 ± 0.7x for the top 1% scores), outperformed existing prioritization scores in the analysis of different GWAS datasets, were transportable to analyze gene expression and non-European datasets, and improved variant prioritization in GWAS fine-mapping studies.