Hypothesis-driven interpretable neural network for interactions between genes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Mechanistic models of genetic interactions are rarely feasible due to a lack of information and computational challenges. Alternatively, machine learning (ML) approaches may predict gene interactions if provided with enough data but they lack interpretability. Here, we propose an ML approach for interpretable genotype-to-fitness mapping, the Direct-Latent Interpretable Model (D-LIM). The neural network is built on a strong hypothesis: mutations in different genes cause independent effects in phenotypes, which then interact via non-linear relationships to determine fitness. D-LIM predicts interpretable genotype-to-fitness maps with state-of-the-art accuracy for gene-to-gene and gene-to-environment perturbations in deep mutational scanning of a metabolic pathway, a protein-protein interaction system, and yeast mutants for environmental adaptation. The hypothesis-driven structure of D-LIM offers interpretable features reminiscent of mechanistic models: the inference of phenotypes, identification of trade-offs, and fitness extrapolation outside of the data domain.