Can Uncertainty Metrics Guide Search? Evaluating Search-Time Decision-Making in Large Language Models

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Sampling-based search methods for Large Language Models (LLMs), such as Chain of Thought (CoT) and Tree of Thought (ToT), improve accuracy by exploring multiple reasoning paths. These approaches can be enhanced by search algorithms like Monte Carlo Tree Search (MCTS) and Bandit strategies, which rely on effective uncertainty estimation. While many uncertainty metrics have been proposed for LLMs, we argue that most of them are primarily descriptive, as they capture model confidence rather than prescriptive tools for guiding search toward correct answers. In this study, we evaluate whether existing uncertainty metrics can support search-time decision-making through two diagnostic tasks: Answer Selection and Answer Refinement. Our results show that while current metrics help flag risky or error-prone responses, they fall short in guiding search toward correctness. This highlights the need for optimization-aware uncertainty metrics explicitly designed to support correctness-driven decisions. Code is available at link.

Article activity feed