Working Memory and Reinforcement Learning Interactions when Simultaneously Pursuing Reward and Avoiding Punishment: No Relationship to Internalizing Symptoms

Peter Hitchcock
Joonhwa Kim
Michael Frank

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Humans learn adaptive behaviors via a durable but incremental reinforcement-learning (RL) system and a fast but fleeting working memory (WM) system. Past work parsing these systems focused on reward learning alone. Hence, little is known about how they interact while simultaneously learning to avoid punishment, and whether arbitrating between these demands is disrupted by psychiatric symptoms. We administered a novel reward/punishment RL-WM task to an online sample oversampled for depression and anxiety symptoms (N=298; n=275 after quality control). We found participants avoided punishment during initial learning, yet poorly retained this avoidance. Computational modeling captured this pattern by allowing the fleeting WM system to facilitate punishment avoidance, while the durable RL system retained little about punishment. We also found individual differences in a previously documented tendency for WM to blunt the RL system. Specifically, while we indeed found that a subset of high-blunting participants paradoxically showed the worst (RL-based) ultimate retention when WM had facilitated initial learning, another subset enjoyed the greatest ultimate retention in this scenario. Importantly, we found that the retention patterns in the task could not be captured by a computational model in which the RL system was replaced by a stimulus-response one. Finally, we found that task performance (analyzed behaviorally and via computational modeling) was largely spared as a function of depression/anxiety and trait rumination. Overall, our findings provide insight into how WM and RL interact when facing the ubiquitous challenge of attaining reward while simultaneously avoiding punishment, and demonstrate intact behavior under internalizing-disorder symptoms.

Version published to 10.31234/osf.io/82pyz on OSF Preprints
Oct 14, 2024

When working memory works: Selective engagement as a tradeoff between cost and flexibility

This article has 1 author:
1. Eren Gunseli
This article has no evaluationsLatest version Feb 4, 2026
Monetary Rewards Modulate Working Memory Performance During Adolescence

This article has 4 authors:
1. Megan Spurney
2. Camille Phaneuf-Hadd
3. Leah Somerville
4. Catherine Insel
This article has no evaluationsLatest version Feb 10, 2026
Automatic value learning results in counterproductive human behavior

This article has 5 authors:
1. Ido Ben-Artzi
2. Maayan Pereg
3. Roy Luria
4. Rani Moran
5. Nitzan Shahar
This article has no evaluationsLatest version Mar 11, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

When working memory works: Selective engagement as a tradeoff between cost and flexibility

Monetary Rewards Modulate Working Memory Performance During Adolescence

Automatic value learning results in counterproductive human behavior