Decode-gLM: Tools to Interpret, Audit, and Steer Genomic Language Models

Aaron Maiwald
Piotr Jedryszek
Florent Draye
Garrett M. Morris
Oliver M. Crook

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

While genomic language models are enabling the de novo design of entire genomes, they remain challenging to interpret, limiting their trustworthiness. Here, we show that sparse autoencoders (SAEs) trained on Nucleotide Transformer activations decompose hidden representations into interpretable biological features without supervision. Across layers and model sizes, SAEs identified over 100 diverse functional annotations encoded in the model’s activations. This included viral regulatory elements such as the CMV enhancer, despite viral genomes being excluded from training data. Tracing this signal revealed contamination in reference databases, demonstrating that interpretability methods can audit training data and identify hidden data leakage. We then show that Meta-SAEs, trained on the decoder weights of another SAE, can identify conceptual hierarchies encoded in the model, including a more abstract feature related to multiple HIV annotations. We confirmed that the features identified by our SAEs were learned during pretraining through probing a randomly initialised model. Finally, we demonstrate that our SAEs allow us to steer model predictions in biologically meaningful ways, showing that we can use an antibiotic-resistance SAE-feature to steer the model toward the A1408G aminoglycoside-resistance mutation in the ribosomal gene 16S rRNA. Together, these results establish SAEs as a method for both discovery and auditing, providing a toolkit for interpretable and trustworthy genomic foundation models. Readers can explore our findings at https://interpretglm.netlify.app/ .

Version published to 10.1101/2025.10.31.685860 on bioRxiv
Nov 3, 2025

Probing Hidden States for Calibrated, Alignment-Resistant Predictions in LLMs

This article has 10 authors:
1. Jacob Berkowitz
2. Sophia Kivelson
3. Apoorva Srinivasan
4. Undina Gisladottir
5. Kevin K. Tsang
6. Jose Miguel Acitores Cortina
7. Aditi Kuchi
8. Jake Patock
9. Ryan Czarny
10. Nicholas P. Tatonetti
This article has no evaluationsLatest version Sep 19, 2025
Towards functional annotation with latent protein language model features

This article has 3 authors:
1. Jake Silberg
2. Elana Simon
3. James Zou
This article has no evaluationsLatest version Oct 4, 2025
Sparse Autoencoders Reveal Interpretable Features in Single-Cell Foundation Models

This article has 4 authors:
1. Flavia Pedrocchi
2. Florian Barkmann
3. Amir Joudaki
4. Valentina Boeva
This article has no evaluationsLatest version Oct 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Probing Hidden States for Calibrated, Alignment-Resistant Predictions in LLMs

Towards functional annotation with latent protein language model features

Sparse Autoencoders Reveal Interpretable Features in Single-Cell Foundation Models