Interpretable biophysical neural networks of transcriptional activation domains separate roles of protein abundance and coactivator binding
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Deep neural networks have improved the accuracy of many difficult prediction tasks in biology, but it remains challenging to interpret these networks and learn molecular mechanisms. Here, we address the interpretability challenges associated with predicting transcriptional activation domains from protein sequence. Activation domains, regions within transcription factors that drive gene expression, were traditionally difficult to predict due to their sequence diversity and poor conservation. Multiple deep neural networks can now accurately predict activation domains, but these predictors are difficult to interpret. With the goal of interpretability, we designed simple neural networks that incorporated biophysical models of activation domains. The simplicity of these neural networks allowed us to visualize their parameters and directly interpret what the networks learned. The biophysical neural networks revealed two new ways that arrangement (i.e. the sequence grammar) of activation domain controlled function: 1) hydrophobic residues both increase activation domain strength and decrease protein abundance, and 2) acidic residues control both activation domain strength and protein abundance. Notably, the biophysical neural networks helped us to recognize the same signatures in complex interpreters of the deeper neural networks. We demonstrate how combining biophysical and deep neural networks maximizes both prediction accuracy and interpretability to yield insights into biological mechanisms.