DPP-CL: Orthogonal Subspace Continual Learning for Dialogue Policy Planning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Continual learning (CL) is critical for large language models (LLMs) to tackle multiple tasks in various downstream applications. However, LLMs struggle to maintain performance in multiple-task training due to catastrophic forgetting, which undermines the learned parameters for old tasks during training for new tasks. Although existing studies typically employ three methods to address this issue, including regularization-based, rehearsal-based, and parameter isolation-based methods, these methods simultaneously lead to new issues, such as privacy concerns, difficulties in handling long sequences. To address existing problems and enhance performance, we propose DPP-CL, a Dialogue Policy Planner (DPP) utilized by CL to facilitate forgetting and improve performance. Specifically, we used gradient descent in an orthogonal subspace to learn new tasks for DPP in multitask learning. Regarding the parameter isolation-based approach that struggles to handle long sequences, we introduce the concatenation of hyperbolic spherical embeddings and Euclidean embeddings to form a novel representation, thereby enhancing the capability of LLMs to understand long-sequence texts. In particular, the DPP training process proceeds sequentially through supervised learning, knowledge distillation, and reinforcement learning. We performed comparison experiments using the standard CL benchmarks and incorporated retrieval-augmented generation into the dialogue architecture for multitask inference in cybersecurity.

Article activity feed