From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models

Etowah Adams
Liam Bai
Minji Lee
Yiyang Yu
Mohammed AlQuraishi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Protein language models (pLMs) are powerful predictors of protein structure and function, learning through unsupervised training on millions of protein sequences. pLMs are thought to capture common motifs in protein sequences, but the specifics of pLM features are not well understood. Identifying these features would not only shed light on how pLMs work, but potentially uncover novel protein biology––studying the model to study the biology. Motivated by this, we train sparse autoencoders (SAEs) on the residual stream of a pLM, ESM-2. By characterizing SAE features, we determine that pLMs use a combination of generic features and family-specific features to represent a protein. In addition, we demonstrate how known sequence determinants of properties such as thermostability and subcellular localization can be identified by linear probing of SAE features. For predictive features without known functional associations, we hypothesize their role in unknown mechanisms and provide visualization tools to aid their interpretation. Our study gives a better understanding of the limitations of pLMs, and demonstrates how SAE features can be used to help generate hypotheses for biological mechanisms. We release our code, model weights and feature visualizer. ¹

Version published to 10.1101/2025.02.06.636901v1 on bioRxiv
Feb 8, 2025

A comprehensive benchmark and guide for sequence-function interpretable deep learning models in genomics

This article has 14 authors:
1. Canzhuang Sun
2. Yu Sun
3. Kang Xu
4. Zhijie He
5. Hao Li
6. Yaru Li
7. Zongyuan Yu
8. Yuyang Wang
9. Xuanwei Lin
10. Xiang Xu
11. Pengzhen Hu
12. Xiaochen Bo
13. Mingzhi Liao
14. Hebing Chen
This article has no evaluationsLatest version Jan 7, 2025
PLMFit : Benchmarking Transfer Learning with Protein Language Models for Protein Engineering

This article has 3 authors:
1. Thomas Bikias
2. Evangelos Stamkopoulos
3. Sai. T. Reddy
This article has no evaluationsLatest version Jan 19, 2025
Learning maximally spanning representations improves protein function annotation

This article has 2 authors:
1. Jiaqi Luo
2. Yunan Luo
This article has no evaluationsLatest version Feb 17, 2025

Listed in

Abstract

Article activity feed

Related articles

A comprehensive benchmark and guide for sequence-function interpretable deep learning models in genomics

PLMFit : Benchmarking Transfer Learning with Protein Language Models for Protein Engineering

Learning maximally spanning representations improves protein function annotation