Can Uncertainty Metrics Guide Search? Evaluating Search-Time Decision-Making in Large Language Models

Pei-Fu Guo
Yun-Da Tsai
Shou-De Lin

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Sampling-based search methods for Large Language Models (LLMs), such as Chain of Thought (CoT) and Tree of Thought (ToT), improve accuracy by exploring multiple reasoning paths. These approaches can be enhanced by search algorithms like Monte Carlo Tree Search (MCTS) and Bandit strategies, which rely on effective uncertainty estimation. While many uncertainty metrics have been proposed for LLMs, we argue that most of them are primarily descriptive, as they capture model confidence rather than prescriptive tools for guiding search toward correct answers. In this study, we evaluate whether existing uncertainty metrics can support search-time decision-making through two diagnostic tasks: Answer Selection and Answer Refinement. Our results show that while current metrics help flag risky or error-prone responses, they fall short in guiding search toward correctness. This highlights the need for optimization-aware uncertainty metrics explicitly designed to support correctness-driven decisions. Code is available at link.

Version published to 10.20944/preprints202506.0479.v1
Jun 6, 2025

Evaluating Log-Likelihood for Confidence Estimation in LLM-Based Multiple-Choice Question Answering

This article has 1 author:
1. Christopher Boseak
This article has no evaluationsLatest version Jul 4, 2025
Heuristics for meta-planning from a normative model of information search

This article has 3 authors:
1. Ionatan Kuperwajs
2. Mark K Ho
3. Wei Ji Ma
This article has no evaluationsLatest version Jun 23, 2025
A Comprehensive and Critical Survey of Large Language Model Inference and Feature Generation

This article has 1 author:
1. Snehil Shrivastava
This article has no evaluationsLatest version Jun 16, 2025

Listed in

Abstract

Article activity feed

Related articles

Evaluating Log-Likelihood for Confidence Estimation in LLM-Based Multiple-Choice Question Answering

Heuristics for meta-planning from a normative model of information search

A Comprehensive and Critical Survey of Large Language Model Inference and Feature Generation