Impaired Reward-Based Learning but Preserved Motor Invigoration in Chronic Stroke

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Reward provides a feedback signal that modulates behaviour through several mechanisms, including invigorating performance and learning of action–outcome associations to guide future choices. After stroke, the ability to utilise reward feedback can be impaired, which may limit the benefits of rehabilitation approaches that use reinforcement. One possibility is that stroke causes a global impairment of reward processing, leading to both reduced invigoration and diminished learning from feedback. Alternatively, reward processing may be selectively disrupted, such that either invigoration or the ability to update beliefs from reward feedback is disproportionately affected.

To test these competing hypotheses, we recruited forty chronic stroke survivors and thirty age-matched healthy controls to complete a probabilistic reversal learning task with both their strong (non-paretic/dominant) and weak (paretic/non-dominant) limb. On each trial, participants reached to one of two targets associated with different reward probabilities that changed unpredictably over time, requiring continued monitoring of outcomes and adaptation of choice behaviour.

Stroke survivors showed reduced reward-based learning compared to controls, expressed as lower overall choice accuracy and a greater tendency to switch responses after rewarded trials (i.e., lower win–stay rates), particularly when using the weak upper limb. Control analyses confirmed that these selective impairments were not explained by general motor impairment or cognitive deficits. To identify the putative computations underlying these behavioural differences in reward-based learning we used an established model of hierarchical Bayesian inference, the Hierarchical Gaussian Filter (HGF). The HGF characterises learning dynamics as trial-by-trial updating of an agent’s beliefs about action–outcome probabilities and their change over time (environmental volatility). Compared to healthy controls, stroke survivors were slower to update their beliefs about action–reward contingencies, an effect most pronounced for the weak upper limb, whereas updating beliefs about environmental volatility remained intact. Reward-based invigoration was also preserved: strong trial-by-trial predictions about action–reward contingencies were associated with faster movement times, with comparable slopes of this association across groups, indicating that motivational drive was maintained in patients despite overall slower performance.

This behavioural dissociation between preserved motivational invigoration but impaired probabilistic reward-based learning highlights a key translational opportunity: to leverage intact motivational pathways to enhance rehabilitation intensity and compliance, and to develop adaptive feedback strategies that compensate for impaired reward learning. Harnessing these complementary approaches could strengthen recovery outcomes and support greater long-term independence after stroke.

Article activity feed