Successively Pruned Q-Learning: Using Self Q-function to Reduce the Overestimation

Li He
Zhaolin Xue
Zhongxue Gan
Lihua Zhang
Zhiyan Dong

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

It’s well-known that the Q-learning algorithm suffers the overestimation owingto using the maximum state-action value as an approximation of the maximumexpected state-action value. Double Q-learning and other algorithms have beenproposed as efficient solutions to alleviate the overestimation. However, theseproposed methods intend to utilize multiple Q-functions to reduce the overes-timation and ignore the information of single Q-function. In this paper, 1) wereinterpret the update process of Q-learning, build a more precise model compat-ible with previous model. 2) We propose a novel and simple method to controlthe maximum bias by employing the information of single Q-function. 3) Ourmethod not only balances between the overestimation and the underestimation,but also attains the minimum bias under proper hyper-parameters. 4) Moreover,it can be naturally generalized to the discrete control domain and continuouscontrol tasks. We reveal that our algorithms outperform Double DQN and otheralgorithms on some representative games. Additionally, classical off-policy actor-critic algorithms also gain benefits from our method.Ultimately, we have extendedour algorithm to multi-agent reinforcement learning algorithms.

Version published to 10.21203/rs.3.rs-5287151/v1 on Research Square
Oct 30, 2024

Lightweight Self-Supervised Representation Learning with Knowledge Distillation on Compact Datasets

This article has 1 author:
1. Khawla Hussein ِAli
This article has no evaluationsLatest version Jun 25, 2025
Model Predictive Task Sampling for Efficient and Robust Adaptation

This article has 7 authors:
1. Qi (Cheems) Wang
2. Zehao Xiao
3. Yixiu Mao
4. Yun Qu
5. Jiayi Shen
6. Yiqin Lv
7. Xiangyang Ji
This article has no evaluationsLatest version May 20, 2025
Evaluating the perception, understanding, and forgetting of Progressive Neural Networks: a quantitative and qualitative analysis

This article has 3 authors:
1. Lucía Güitta-López
2. Jaime Boal
3. Álvaro Jesús López-López
This article has no evaluationsLatest version May 15, 2025

Listed in

Abstract

Article activity feed

Related articles

Lightweight Self-Supervised Representation Learning with Knowledge Distillation on Compact Datasets

Model Predictive Task Sampling for Efficient and Robust Adaptation

Evaluating the perception, understanding, and forgetting of Progressive Neural Networks: a quantitative and qualitative analysis