Windows deep transformer Q-networks: an extended variance reduction architecture for partially observable reinforcement learning

Zijian Wang
Bin Wang
Hongbo Dou
Zhongyuan Liu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Recently, one of the mainstream research directions in Reinforcement Learning involves Partial Observability(PO). One approach to addressing this task is the establishment of a sequence-tosequence model. Transformer is an excellent architecture for building a sequence-to-sequence model, and as the network of Deep Q-Networks, Transformer has achieved certain outperformed results in dealing with partial observability problems. However, Transformer relies heavily on the quality of input data. Overestimation exists in Deep Q-Networks(DQN), affecting the quality of input data to Transformer, resulting in poor training efficiency. In this work, we propose Windows Deep Transformer Q-Networks(Windows DTQN), which is an improved architecture that mitigates the overestimation issue in DQN by reducing the variance of Q-values, thereby enhancing the quality of input data and improving the training efficiency of Transformer. Our experiments demonstrate that our approach achieves better results than the current mainstream DQN algorithms in Partially Observable Markov Decision Process(POMDP).

Version published to 10.21203/rs.3.rs-4220648/v1 on Research Square
Apr 8, 2024

RLDSCP: Reducing Label Dependency with Self-Attention and Contrastive Pretraining

This article has 2 authors:
1. sai prabanjan kumar kalvapalli
2. MALA C
This article has no evaluationsLatest version Aug 27, 2025
Efficient Model Pruning for Large-Scale Deep Learning Models: Enhancing Performance and Reducing Computational Overhead

This article has 1 author:
1. Dinesh Kumar Koilada
This article has no evaluationsLatest version Sep 3, 2025
AURA: An Adaptive Unified Regularization Approach for Gradient-Based Optimization

This article has 1 author:
1. Keshav Gupta
This article has no evaluationsLatest version Sep 9, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

RLDSCP: Reducing Label Dependency with Self-Attention and Contrastive Pretraining

Efficient Model Pruning for Large-Scale Deep Learning Models: Enhancing Performance and Reducing Computational Overhead

AURA: An Adaptive Unified Regularization Approach for Gradient-Based Optimization