The blessing and curse of Value-Shaping imitation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Empirical studies suggest that in the context of reinforcement learning, imitation is instantiated as a form of Value-Shaping process (VS), where the actions of another learner (the demonstrator) affect the value function of the learner. These studies also show that, consistently with evolutionary theory, imitation is controlled through a meta-learning process, where inaccurate demonstrators are progressively less imitated. In the present study we aimed at evaluating the normative status of Value-Shaping, in comparison to other forms of imitation and vicarious learning. To do so, we simulated increasingly complex scenarios and showed that, while in simple settings Value-Shaping imitation is generally good (and outperforms other forms of imitations), it is maladaptive in more complex settings. More specifically, when imitation is bidirectional and the environment unstable, VS leads to the worst possible performance. We speculate that it may be contributing to echo chambers and opinion polarization in real life. Finally, we introduce a new model, conditional Value-Shaping (cVS), which overcomes these difficulties.