Model-based and model-free valuation signals in the human brain vary markedly in their relationship to individual differences in behavioral control

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Human action selection under reinforcement is thought to rely on two distinct strategies: model-free and model-based reinforcement learning. While behavior in sequential decision-making tasks often reflects a mixture of both, the neural basis of individual differences in their expression remains unclear. To investigate this, we conducted a large-scale fMRI study with 179 participants performing a variant of the two-step task. Using both cluster-defined subgroups and computational parameter estimates, we found that the ventromedial prefrontal cortex encodes model-based and model-free value signals differently depending on individual strategy use. Model-based value signals were strongly linked to the degree of model-based behavioral reliance, whereas model-free signals appeared regardless of model-free behavioral influence. Leveraging the large sample, we also addressed a longstanding debate about whether model-based knowledge is incorporated into reward prediction errors or if such signals are purely model-free. Surprisingly, ventral striatum prediction error activity was better explained by model-based computations, while a middle caudate error signal was more aligned with model-free learning. Moreover, individuals lacking both model-based behavior and model-based neural signals exhibited impaired state prediction errors, suggesting a difficulty in building or updating their internal model of the environment. These findings indicate that model-free signals are ubiquitous across individuals, even in those not behaviorally relying on model-free strategies, while model-based representations appear only in those individuals utilizing such a strategy at the behavioral level, the absence of which may depend in part on underlying difficulties in forming accurate model-based predictions.

Article activity feed