ConfuseNN: Interpreting convolutional neural network inferences in population genomics with data shuffling

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Convolutional neural networks (CNNs) have become powerful tools for population genomic inference, yet understanding which genomic features drive their performance remains challenging. We introduce ConfuseNN, a method that systematically shuffles input haplotype matrices to disrupt specific population genetic features and evaluate their contribution to CNN performance. By sequentially removing signals from linkage disequilibrium, allele frequency, and other population genetic patterns in test data, we evaluate how each feature contributes to CNN performance. We applied ConfuseNN to three published CNNs for demographic history and selection inference, confirming the importance of specific data features and identifying limitations of network architecture and of simulated training and testing data design. ConfuseNN provides an accessible biologically motivated framework for interpreting CNN behavior across different tasks in population genetics, helping bridge the gap between powerful deep learning approaches and traditional population genetic theory.

Article activity feed