Data-Driven Symbolic Higher-Order Epistasis Discovery with Kolmogorov-Arnold Networks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Many human diseases are polygenic conditions that arise from a complex interplay of interactions between multiple genes at different loci, but currently most Genome-Wide Association Studies (GWAS) largely only consider the main additive effects of single nucleotide polymorphisms (SNPs), resulting in a missing heritability problem in some complex traits. Identifying non-additive interactions, or epistasis, at a higher-order could aid in filling this gap, but it is computationally difficult due to the massive search space involved. Current epistasis detection approaches struggle with noncartesian higher order interactions and lack inherent explainability. We present a novel deep learning (DL) approach, EPIstasis Discovery with Kolmogorov-Arnold Networks (EPIK), a data-driven, modular, and symbolically representable framework. We also introduce a novel approach for higher-order XOR (a non-Cartesian type) interaction detection, utilized in EPIK’s XOR detection module. EPIK slightly outperforms other DL approaches on simulated pure epistasis interactions benchmark in average F1 score. It outperforms other, general, traditional epistasis detection approaches on simulated mixed epistasis detection datasets and real-world GWAS datasets of Arabidopsis Thaliana. Finally, EPIK recovers a known gene interaction between MAPT and WNT3 for Parkinson’s Disease (PD) while also suggesting a more complex interaction between MAPT, WNT3, and another gene, KANSL1.