Successively Pruned Q-Learning: Using Self Q-function to Reduce the Overestimation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

It’s well-known that the Q-learning algorithm suffers the overestimation owingto using the maximum state-action value as an approximation of the maximumexpected state-action value. Double Q-learning and other algorithms have beenproposed as efficient solutions to alleviate the overestimation. However, theseproposed methods intend to utilize multiple Q-functions to reduce the overes-timation and ignore the information of single Q-function. In this paper, 1) wereinterpret the update process of Q-learning, build a more precise model compat-ible with previous model. 2) We propose a novel and simple method to controlthe maximum bias by employing the information of single Q-function. 3) Ourmethod not only balances between the overestimation and the underestimation,but also attains the minimum bias under proper hyper-parameters. 4) Moreover,it can be naturally generalized to the discrete control domain and continuouscontrol tasks. We reveal that our algorithms outperform Double DQN and otheralgorithms on some representative games. Additionally, classical off-policy actor-critic algorithms also gain benefits from our method.Ultimately, we have extendedour algorithm to multi-agent reinforcement learning algorithms.

Article activity feed