Mechanistic evidence that motif-gated domain recognition drives contact prediction in protein language models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Protein language models (pLMs) achieve state-of-the-art performance on protein structure and function prediction tasks, yet their internal computations re-main opaque. Sparse autoencoders (SAEs) have been used to recover sparse features, called latents, from pLM layer representations, whose activations cor-relate with known biological concepts. However, prior work has not established which model concepts are causally necessary for pLM performance on down-stream tasks. Here, we adapt causal activation patching to the pLM setting and perform it in SAE latent space to extract the minimal circuit responsible for accuracy in a contact prediction task for two case study proteins. We observe that preserving only a tiny fraction of latent–token pairs (0.022% and 0.015%) is sufficient to retain contact prediction accuracy in a residue unmasking experiment. Our circuit indicates a two-step computation in which early-layer motif detectors respond to short local sequence patterns, gating mid-to-late domain detectors which are selective for protein domains and families. Path-level ab-lations confirm the causal dependence of domain latents on upstream motif latents. To evaluate these components quantitatively, we introduce two diagnostics: a Motif Conservation Test and a Domain Selectivity Framework that supports hypothesis-driven tests. All candidate motif-detector latents pass the conservation test, and 18/23 candidate domain-detector latents achieve AUROC ≥0.95. To our knowledge, this is the first circuits-style causal analysis for pLMs, pin-pointing the motifs, domains, and motif-domain interactions that drive contact prediction in two specific case studies. The framework introduced herein will enable future mechanistic dissection of protein language models. Code available at https://github.com/NainaniJatinZ/plm_circuits