Biologically-informed Interpretable Deep Learning Framework for Phenotype Prediction and Gene Interaction Detection
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The detection of epistatic effects has significant potential to enhance understanding of the genetic basis of complex traits, but statistical epistatic analysis methods are complex and labour intensive. In recent years, Deep Neural Networks (DNNs) have emerged as a powerful tool for modelling arbitrarily complex genetic interactions in relation to a phenotype; however, their utility is often limited by the challenge of interpreting their predictive reasoning. Although DNN interpretation methods exist, they are typically not designed for genomic applications, leading to hard-to-understand outputs with limited relevance to the field. To address this gap, we introduce GENEPAIR – a novel DNN interpretation framework designed specifically for genomic data, aimed at detecting putative associated gene-gene interactions for a phenotype of interest. Our approach offers several key advantages including model agnositicity, robustness to sample- and variant-level data variance, and flexibility to integrate varied domain knowledge into interpretable features. We demonstrate the efficacy of our method by applying it to a DNN trained on genetic variant data to predict Body Mass Index (BMI). The results of the analysis not only reveal single gene influences in close alignment with literature but also uncover previously unreported gene-gene interactions, demonstrating its significant potential for genomic discovery.
Author summary
Understanding how genes interact improves our understanding of how genetic pathways influence common diseases, potentially leading to new treatments. However, identifying these interactions is particularly challenging when a large number of genes are involved. Machine learning models, such as Deep Neural Networks (DNNs), excel at detecting complex patterns in data, but interpreting these patterns from trained networks remains a significant challenge. We have developed a novel framework to extract insights from a DNN trained to predict a trait, revealing how genes in the dataset may interact to influence the model’s predictions. Our approach is easy to incorporate or adapt to biological prior knowledge compared to existing methods, offering a powerful tool for discovering previously unknown gene-gene interactions.