HIAT: Human-in-the-Loop Reinforcement Learning with Auxiliary Task
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Human-in-the-loop reinforcement learning (HIRL) tackles the challenges of inefficient exploration and low sample efficiency by incorporating human feedback into the learning process to guide exploration and speed up learning. Nevertheless, most recent approaches overlook the problem of sample imbalance caused by varying behavioral policy distributions in the early stages of training, which results in policy performance degradation due to biased learning toward dominant low-quality data. In this paper, we propose a human-in-the-loop reinforcement learning method by constructing an auxiliary task, namely Human-in-the-Loop Reinforcement Learning with Auxiliary Task (HIAT). Specifically, state-action pairs are utilized to create an auxiliary task that assesses the similarity between different policies, forcing the model to focus on the differences in data sources. The HIAT method guides the learning process of new policies during the policy evaluation phase by implicitly influencing the assessment of different actions, thereby compensating for sample imbalance. Subsequently, we introduce an adaptive weighting version, namely adaptive weighting based HIAT (AWHIAT), which adjusts the impact of the auxiliary task on the primary task based on their task-relevance and mitigates conflicts between different tasks caused by cognitive load. Through empirical evaluations on six continuous control benchmarks, the results demonstrate that AWHIAT significantly improves mean episode reward and sample efficiency compared to four representative HIRL methods.