Data-Driven Symbolic Higher-Order Epistasis Discovery with Kolmogorov-Arnold Networks

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Many human diseases are polygenic conditions that arise from a complex interplay of interactions between multiple genes at different loci, but currently most Genome-Wide Association Studies (GWAS) largely only consider the main additive effects of single nucleotide polymorphisms (SNPs), resulting in a missing heritability problem in some complex traits. Identifying non-additive interactions, or epistasis, at a higher-order could aid in filling this gap, but it is computationally difficult due to the massive search space involved. Current epistasis detection approaches struggle with noncartesian higher order interactions and lack inherent explainability. We present a novel deep learning (DL) approach, EPIstasis Discovery with Kolmogorov-Arnold Networks (EPIK), a data-driven, modular, and symbolically representable framework. We also introduce a novel approach for higher-order XOR (a non-Cartesian type) interaction detection, utilized in EPIK’s XOR detection module. EPIK slightly outperforms other DL approaches on simulated pure epistasis interactions benchmark in average F1 score. It outperforms other, general, traditional epistasis detection approaches on simulated mixed epistasis detection datasets and real-world GWAS datasets of Arabidopsis Thaliana. Finally, EPIK recovers a known gene interaction between MAPT and WNT3 for Parkinson’s Disease (PD) while also suggesting a more complex interaction between MAPT, WNT3, and another gene, KANSL1.

Article activity feed