Probing Large Language Model Hidden States for Adverse Drug Reaction Knowledge

Jacob Berkowitz
Davy Weissenbacher
Apoorva Srinivasan
Nadine A. Friedrich
Jose Miguel Acitores Cortina
Sophia Kivelson
Graciela Gonzalez Hernandez
Nicholas P. Tatonetti

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) integrate knowledge from diverse sources into a single set of internal weights. However, these representations are difficult to interpret, complicating our understanding of the models’ learning capabilities. Sparse autoencoders (SAEs) linearize LLM embeddings, creating monosemantic features that both provide insight into the model’s comprehension and simplify downstream machine learning tasks. These features are especially important in biomedical applications where explainability is critical. Here, we evaluate the use of Gemma Scope SAEs to identify how LLMs store known facts involving adverse drug reactions (ADRs). We transform hidden-state embeddings of drug names from Gemma2-9b-it into interpretable features and train a linear classifier on these features to classify ADR likelihood, evaluating against an established benchmark. These embeddings provide strong predictive performance, giving AUC-ROC of 0.957 for identifying acute kidney injury, 0.902 for acute liver injury, 0.954 for acute myocardial infarction, and 0.963 for gastrointestinal bleeds. Notably, there are no significant differences (p > 0.05) in performance between the simple linear classifiers built on SAE outputs and neural networks trained on the raw embeddings, suggesting that the information lost in reconstruction is minima. This finding suggests that SAE-derived representations retain the essential information from the LLM while reducing model complexity, paving the way for more transparent, compute-efficient strategies. We believe that this approach can help synthesize the biomedical knowledge our models learn in training and be used for downstream applications, such as expanding reference sets for pharmacovigilance.

Version published to 10.1101/2025.02.09.25321620v1 on medRxiv
Feb 12, 2025

Hazard-aware adaptations bridge the generalization gap in large language models: a nationwide study

This article has 15 authors:
1. Julie Wu
2. Sydney Conover
3. Chloe Su
4. June Corrigan
5. John Culnan
6. Yuhan Liu
7. Michael Kelley
8. Nhan Do
9. Shipra Arya
10. Alex Sox-Harris
11. Curtis Langlotz
12. Renda Weiner
13. Westyn Branch-Elliman
14. Summer Han
15. Nathanael Fillmore
This article has no evaluationsLatest version Feb 17, 2025
irAE-GPT: Leveraging large language models to identify immune-related adverse events in electronic health records and clinical trial datasets

This article has 20 authors:
1. Cosmin A. Bejan
2. Michelle Wang
3. Sriram Venkateswaran
4. Ewa A. Bergmann
5. Laura Hiles
6. Yaomin Xu
7. G Scott Chandler
8. Sam Brondfield
9. Jordyn Silverstein
10. Francis Wright
11. Kimberly de Dios
12. Daniel Kim
13. Eric Mukherjee
14. Matthew S. Krantz
15. Lydia Yao
16. Douglas B. Johnson
17. Elizabeth J. Phillips
18. Justin M. Balko
19. Rajat Mohindra
20. Zoe Quandt
This article has no evaluationsLatest version Mar 6, 2025
SensitiveCancerGPT: Leveraging Generative Large Language Model on Structured Omics Data to Optimize Drug Sensitivity Prediction

This article has 7 authors:
1. Shaika Chowdhury
2. Sivaraman Rajaganapathy
3. Lichao Sun
4. Liewei Wang
5. Ping Yang
6. James R Cerhan
7. Nansu Zong
This article has no evaluationsLatest version Mar 3, 2025

Listed in

Abstract

Article activity feed

Related articles

Hazard-aware adaptations bridge the generalization gap in large language models: a nationwide study

irAE-GPT: Leveraging large language models to identify immune-related adverse events in electronic health records and clinical trial datasets

SensitiveCancerGPT: Leveraging Generative Large Language Model on Structured Omics Data to Optimize Drug Sensitivity Prediction