Explaining how mutations affect AlphaFold predictions

Madeleine F. Clore
Joseph F. Thole
Suchetan Dontha
Pramesh Sharma
Naomi Greenberg
Marie-Paule Strub
Mary Starich
Davin Jensen
Brian F. Volkman
Matthew Coudron
Lauren L. Porter

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Arcadia Science)

Abstract

Transformer models, neural networks that learn context by identifying relationships in sequential data, underpin many recent advances in artificial intelligence. Nevertheless, their inner workings are difficult to explain. Here, we find that a transformer model within the AlphaFold architecture uses simple, sparse patterns of amino acids to select protein conformations. To identify these patterns, we developed a straightforward algorithm called Conformational Attention Analysis Tool (CAAT). CAAT identifies amino acid positions that affect AlphaFold’s predictions substantially when modified. These effects are corroborated by experiments in several cases. By contrast, modifying amino acids ignored by CAAT affects AlphaFold predictions less, regardless of experimental ground truth. Our results demonstrate that CAAT successfully identifies the positions of some amino acids important for protein structure prediction, narrowing the search space required to predict effective mutations and suggesting a framework that can be applied to other transformer-based neural networks.

Arcadia Science
Jan 12, 2026

AF overpredicted the dimer conformation substantially

It seems pertinent to establish why the dimer conformation is predicted in XCL1. It would be valuable to run a structural alignment of both XCL1 conformations against the AF2/3 training dataset.

This would reveal several things. First, which XCL1 conformations are in the training dataset, if any? Either being present would be considered data leakage. And second, how many hits correspond to each of the conformations?

My hypothesis is that either (a) the XCL1 dimer is present in the training dataset and the chemokine isn't, or (b) neither/both are present, but the dimer yields significantly more hits, creating a dimer preference for XCL1 and all of its derived "ancestors".

Depending on the dataset size (I forget how much clustering the AF folks did), the alignment could be feasibly …

AF overpredicted the dimer conformation substantially

It seems pertinent to establish why the dimer conformation is predicted in XCL1. It would be valuable to run a structural alignment of both XCL1 conformations against the AF2/3 training dataset.

This would reveal several things. First, which XCL1 conformations are in the training dataset, if any? Either being present would be considered data leakage. And second, how many hits correspond to each of the conformations?

My hypothesis is that either (a) the XCL1 dimer is present in the training dataset and the chemokine isn't, or (b) neither/both are present, but the dimer yields significantly more hits, creating a dimer preference for XCL1 and all of its derived "ancestors".

Depending on the dataset size (I forget how much clustering the AF folks did), the alignment could be feasibly conducted using TMAlign. Otherwise, foldseek or other scalable aligners would work.

Read the original source
Arcadia Science
Jan 9, 2026

AF overpredicted the dimer conformation substantially

It might be valuable to check which conformation (if not both) were included in the original model training datasets.

Read the original source
Arcadia Science
Jan 9, 2026

XCL1 attention heads displayed an interaction network unique to the dimer fold (Figure 2B). Using an interpretation strategy originally suggested by the AF team (C), this network is characterized by vertical lines corresponding to interacting amino acids (Figure S5A,B).

It's interesting to see how the key residues in these attention maps interact globally with the total sequence. This feels somewhat distinct from the results of Zhang et al. on the categorical Jacobian which picks up strong pairwise patterns between amino acids (predicting the contact map of a folded sequence). I wonder if this pattern is a unique feature of these fold-switching proteins or a general phenomenon in Alphafold.

Read the original source
Version published to 10.64898/2025.12.30.697132 on bioRxiv
Jan 4, 2026

The Evolution of the AlphaFold Architecture

This article has 1 author:
1. Y.C.B.J. Dissanayaka
This article has no evaluationsLatest version Jan 9, 2026
Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026
Quantum-Assisted Refinement of AlphaFold Protein Structures

This article has 1 author:
1. Parham Ghayour
This article has no evaluationsLatest version Dec 31, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Evolution of the AlphaFold Architecture

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

Quantum-Assisted Refinement of AlphaFold Protein Structures