Rational decisions in multi-step environments with few rollouts

Sixing Chen
Kristopher T. Jensen
Marcelo G Mattar

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

@abelchy's saved articles (abelchy)

Abstract

People routinely make decisions by mentally simulating the potential outcomes of their actions. However, this process appears computationally intractable in real-world situations involving sequences of decisions with exponentially many possible futures. How the brain efficiently evaluates temporally extended decisions despite limited cognitive resources remains a fundamental puzzle. Here we present a mathematical theory showing that for decisions in multi-step environments, the rational strategy is to perform only a few mental simulations, formalized as rollouts. This is because the first rollouts provide substantially more information than later ones despite taking a similar amount of time, so the opportunity cost of additional simulations quickly outweighs their marginal benefit. Our framework demonstrates that this efficiency relies on the correlated reward structure of naturalistic environments, which allows information from one rollout to generalize to many related future paths. This theory also explains why, under resource constraints, many shallow rollouts are preferable to fewer deep ones; why apparently myopic decisions can arise without explicit temporal discounting; and how to relate the dynamics of planning to evidence accumulation models. We validate predictions of our theory in two behavioral experiments, which confirm that humans achieve higher reward rates with few rollouts and dynamically adjust their simulation depth based on available cognitive resources. These findings reveal how the brain balances the depth and breadth of mental simulation to make effective decisions under computational constraints, providing a unifying account of planning that bridges computationally intensive search algorithms in machine learning and the remarkable efficiency of human decision-making.

Version published to 10.31234/osf.io/gpt39_v1 on OSF Preprints
Aug 13, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed