Windows deep transformer Q-networks: an extended variance reduction architecture for partially observable reinforcement learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Recently, one of the mainstream research directions in Reinforcement Learning involves Partial Observability(PO). One approach to addressing this task is the establishment of a sequence-tosequence model. Transformer is an excellent architecture for building a sequence-to-sequence model, and as the network of Deep Q-Networks, Transformer has achieved certain outperformed results in dealing with partial observability problems. However, Transformer relies heavily on the quality of input data. Overestimation exists in Deep Q-Networks(DQN), affecting the quality of input data to Transformer, resulting in poor training efficiency. In this work, we propose Windows Deep Transformer Q-Networks(Windows DTQN), which is an improved architecture that mitigates the overestimation issue in DQN by reducing the variance of Q-values, thereby enhancing the quality of input data and improving the training efficiency of Transformer. Our experiments demonstrate that our approach achieves better results than the current mainstream DQN algorithms in Partially Observable Markov Decision Process(POMDP).