A deep learning model recapitulates position specific effects of plant regulatory sequences and suggests genes under complex regulation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Deep neural networks can be trained to predict gene expression directly from genomic sequence, thereby implicitly learning regulatory sequence patterns from scratch, minimizing the bias imposed by prior assumptions. A challenging, yet promising prospect is the extraction of novel insights into gene-regulatory mechanisms, by probing and interpreting such gene expression models. Using a branched convolutional neural network architecture trained on promoter and terminator sequences of allopolyploid Brassica napus and the closely related model organism Arabidopsis thaliana, we show that deep learning models can successfully capture the positional binding preferences of some transcription factor families, without having been trained on transcription factor binding data. Furthermore, we show that our model did not only detect local sequence patterns, but was also able to determine their function based on their positional context. We also found that increased prediction error correlated with additional more distal or epigenetic regulatory input. On the prediction task itself, we were able to match or outperform all previously published regression models for gene expression prediction in plants. Our results demonstrate that deep learning can be used to understand the regulatory architecture of gene expression in plants. A better understanding of gene regulation in the context of polyploid genomes is of particular economic importance, due to their prevalence among major crops. In the future, we hope that such models may facilitate the targeted engineering of gene regulation in crops.