Improving Counterfactual Story Rewriting with Policy-Gradient Approaches

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Counterfactual story rewriting is the task of revising an existing narrative in light of an alternative event while retaining the unchanged elements of the story and its overall coherence. This task is challenging for NLP models because the changes expected in the original story are typically small and circumscribed, and conventional training objectives such as maximum likelihood fail to capture them effectively. For this reason, in this paper we propose a reinforcement learning (RL) approach to counterfactual story rewriting that explicitly rewards the desired counterfactual changes. Specifically, we propose fine-tuning a seq2seq model using policy-gradient approaches (REINFORCE with baseline and proximal policy optimization) with a reward function designed to capture both adherence to the reference edited story and semantic coherence. Experimental results on the TimeTravel dataset show that our RL-based approach has been capable of producing better rewritings compared to the conventionally-trained baseline, and outperform two contemporary large language models on this task. Overall, our findings highlight the benefit of reinforcement learning for complex, controlled text generation tasks requiring nuanced predictions.

Article activity feed