Selective Pupil Size Response Within direct and random exploration and exploitation Behaviors

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

When making decisions, the explore - exploit dilemma represents balancing reward maximization with uncertainty reduction. While reinforcement learning models often treat exploration as stochastic variability, theories such as Adaptive Gain Theory (AGT) and Expected Value of Control (EVC) suggest that exploration and exploitation can reflect strategic control allocation. Pupillometry provides a window into the locus coeruleus–norepinephrine system, indexing cognitive effort and task engagement. The present study combined pupillometry with the Horizon Task to examine whether directed and random exploration and exploitation differentially recruit cognitive resources under varying environmental contexts. Thirty-five adults completed a task manipulating three variables: value gap (reward difference), information gap (sampling imbalance), and choice horizon (1 vs. 6 free trials). Behavioral analyses replicated established findings: small value gaps promoted exploration, unequal sampling elicited information-seeking, and long horizons increased directed exploration. However, pupillary responses diverged from behavior, showing selective sensitivity to choice horizons. Pupil size was larger in short horizon conditions compared with long-horizon, suggesting increased control engagement under conditions of heightened consequence, whereas value and information gaps did not elicit significant modulation. These findings challenge the view that exploration uniformly entails increased effort and instead highlight that strategic context determines pupil-indexed engagement. Our results provide evidence that cognitive effort allocation during exploration depends on the prospective utility and irreversibility of decisions, bridging behavioral and neurocomputational perspectives.

Article activity feed