Improving Counterfactual Story Rewriting with Policy-Gradient Approaches

Amelie Girard
Inigo Jauregi Unanue
Massimo Piccardi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Counterfactual story rewriting is the task of revising an existing narrative in light of an alternative event while retaining the unchanged elements of the story and its overall coherence. This task is challenging for NLP models because the changes expected in the original story are typically small and circumscribed, and conventional training objectives such as maximum likelihood fail to capture them effectively. For this reason, in this paper we propose a reinforcement learning (RL) approach to counterfactual story rewriting that explicitly rewards the desired counterfactual changes. Specifically, we propose fine-tuning a seq2seq model using policy-gradient approaches (REINFORCE with baseline and proximal policy optimization) with a reward function designed to capture both adherence to the reference edited story and semantic coherence. Experimental results on the TimeTravel dataset show that our RL-based approach has been capable of producing better rewritings compared to the conventionally-trained baseline, and outperform two contemporary large language models on this task. Overall, our findings highlight the benefit of reinforcement learning for complex, controlled text generation tasks requiring nuanced predictions.

Version published to 10.21203/rs.3.rs-6694750/v1 on Research Square
Jul 9, 2025

Interpreting BERT Using LIME and SHAP

This article has 1 author:
1. Manish Shukla
This article has no evaluationsLatest version Aug 18, 2025
DPP-CL: Orthogonal Subspace Continual Learning for Dialogue Policy Planning

This article has 5 authors:
1. Yue Han
2. Rong Jiang
3. Yinxuan Huang
4. Aiping Li
5. Weihong Han
This article has no evaluationsLatest version Jul 18, 2025
Interpreting BERT Using LIME and SHAP

This article has 1 author:
1. Manish Shukla
This article has no evaluationsLatest version Aug 12, 2025

Listed in

Abstract

Article activity feed

Related articles

Interpreting BERT Using LIME and SHAP

DPP-CL: Orthogonal Subspace Continual Learning for Dialogue Policy Planning

Interpreting BERT Using LIME and SHAP