Probing Hidden States for Calibrated, Alignment-Resistant Predictions in LLMs

Jacob Berkowitz
Sophia Kivelson
Apoorva Srinivasan
Undina Gisladottir
Kevin K. Tsang
Jose Miguel Acitores Cortina
Aditi Kuchi
Jake Patock
Ryan Czarny
Nicholas P. Tatonetti

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Scientific applications of large language models (LLMs) demand reliable, well-calibrated predictions, but standard generative approaches often fail to fully access relevant knowledge contained in their internal representations. As a result, models appear less capable than they are, with useful information remaining latent. We present PING (Probing INternal states of Generative models), an open-source framework that trains lightweight probes on frozen, HuggingFace-compatible transformers to deliver structured predictions with minimal compute overhead. Across diverse models and benchmarks including MMLU for broad coverage and MedMCQA for clinical focus, PING matches or exceeds generative accuracy while reducing Expected Calibration Error by up to 96%. Strikingly, on an LLM that has been explicitly safety-tuned to withhold medical information, PING recovered 87% of lost MedMCQA performance while generative accuracy is zero, showing this information still exists in the model’s latent space. The accompanying pingkit package makes these methods easy to deploy and is available through PyPI.

Version published to 10.1101/2025.09.17.25336018 on medRxiv
Sep 19, 2025

Uncensored AI in the Wild: Tracking Publicly Available and Locally Deployable LLMs

This article has 1 author:
1. Bahrad A. Sokhansanj
This article has no evaluationsLatest version Oct 18, 2025
Specializing Large Language Models for Process Modeling via Reinforcement Learning with Verifiable and Universal Rewards

This article has 4 authors:
1. Alessandro Berti
2. Xiaoting Wang
3. Humam Kourani
4. Wil M.P. van der Aalst
This article has no evaluationsLatest version Oct 6, 2025
Which pLM to choose?

This article has 6 authors:
1. Tobias Senoner
2. Ivan Koludarov
3. Joshua Guenther
4. Amarda Shehu
5. Burkhard Rost
6. Yana Bromberg
This article has no evaluationsLatest version Oct 31, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Uncensored AI in the Wild: Tracking Publicly Available and Locally Deployable LLMs

Specializing Large Language Models for Process Modeling via Reinforcement Learning with Verifiable and Universal Rewards

Which pLM to choose?