Windows deep transformer Q-networks: an extended variance reduction architecture for partially observable reinforcement learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Recently, one of the mainstream research directions in Reinforcement Learning involves Partial Observability(PO). One approach to addressing this task is the establishment of a sequence-tosequence model. Transformer is an excellent architecture for building a sequence-to-sequence model, and as the network of Deep Q-Networks, Transformer has achieved certain outperformed results in dealing with partial observability problems. However, Transformer relies heavily on the quality of input data. Overestimation exists in Deep Q-Networks(DQN), affecting the quality of input data to Transformer, resulting in poor training efficiency. In this work, we propose Windows Deep Transformer Q-Networks(Windows DTQN), which is an improved architecture that mitigates the overestimation issue in DQN by reducing the variance of Q-values, thereby enhancing the quality of input data and improving the training efficiency of Transformer. Our experiments demonstrate that our approach achieves better results than the current mainstream DQN algorithms in Partially Observable Markov Decision Process(POMDP).

Article activity feed