Explaining how mutations affect AlphaFold predictions
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Arcadia Science)
Abstract
Transformer models, neural networks that learn context by identifying relationships in sequential data, underpin many recent advances in artificial intelligence. Nevertheless, their inner workings are difficult to explain. Here, we find that a transformer model within the AlphaFold architecture uses simple, sparse patterns of amino acids to select protein conformations. To identify these patterns, we developed a straightforward algorithm called Conformational Attention Analysis Tool (CAAT). CAAT identifies amino acid positions that affect AlphaFold’s predictions substantially when modified. These effects are corroborated by experiments in several cases. By contrast, modifying amino acids ignored by CAAT affects AlphaFold predictions less, regardless of experimental ground truth. Our results demonstrate that CAAT successfully identifies the positions of some amino acids important for protein structure, narrowing the search space required to make effective mutations and suggesting a framework that can be applied to other transformer-based neural networks.
Article activity feed
-
AF overpredicted the dimer conformation substantially
It might be valuable to check which conformation (if not both) were included in the original model training datasets.
-
XCL1 attention heads displayed an interaction network unique to the dimer fold (Figure 2B). Using an interpretation strategy originally suggested by the AF team (C), this network is characterized by vertical lines corresponding to interacting amino acids (Figure S5A,B).
It's interesting to see how the key residues in these attention maps interact globally with the total sequence. This feels somewhat distinct from the results of Zhang et al. on the categorical Jacobian which picks up strong pairwise patterns between amino acids (predicting the contact map of a folded sequence). I wonder if this pattern is a unique feature of these fold-switching proteins or a general phenomenon in Alphafold.
-