DPP-CL: Orthogonal Subspace Continual Learning for Dialogue Policy Planning

Yue Han
Rong Jiang
Yinxuan Huang
Aiping Li
Weihong Han

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Continual learning (CL) is critical for large language models (LLMs) to tackle multiple tasks in various downstream applications. However, LLMs struggle to maintain performance in multiple-task training due to catastrophic forgetting, which undermines the learned parameters for old tasks during training for new tasks. Although existing studies typically employ three methods to address this issue, including regularization-based, rehearsal-based, and parameter isolation-based methods, these methods simultaneously lead to new issues, such as privacy concerns, difficulties in handling long sequences. To address existing problems and enhance performance, we propose DPP-CL, a Dialogue Policy Planner (DPP) utilized by CL to facilitate forgetting and improve performance. Specifically, we used gradient descent in an orthogonal subspace to learn new tasks for DPP in multitask learning. Regarding the parameter isolation-based approach that struggles to handle long sequences, we introduce the concatenation of hyperbolic spherical embeddings and Euclidean embeddings to form a novel representation, thereby enhancing the capability of LLMs to understand long-sequence texts. In particular, the DPP training process proceeds sequentially through supervised learning, knowledge distillation, and reinforcement learning. We performed comparison experiments using the standard CL benchmarks and incorporated retrieval-augmented generation into the dialogue architecture for multitask inference in cybersecurity.

Version published to 10.21203/rs.3.rs-7024974/v1 on Research Square
Jul 18, 2025

RLDSCP: Reducing Label Dependency with Self-Attention and Contrastive Pretraining

This article has 2 authors:
1. sai prabanjan kumar kalvapalli
2. MALA C
This article has no evaluationsLatest version Aug 27, 2025
Improving Counterfactual Story Rewriting with Policy-Gradient Approaches

This article has 3 authors:
1. Amelie Girard
2. Inigo Jauregi Unanue
3. Massimo Piccardi
This article has no evaluationsLatest version Jul 9, 2025
Fluent vs. Non-fluent Data Augmentation in Knowledge Distillation for Machine Translation for Low-Resource Languages

This article has 4 authors:
1. Aarón Galiano-Jiménez
2. Juan Antonio Pérez-Ortiz
3. Felipe Sánchez-Martínez
4. Víctor M. Sánchez-Cartagena
This article has no evaluationsLatest version Sep 3, 2025

Listed in

Abstract

Article activity feed

Related articles

RLDSCP: Reducing Label Dependency with Self-Attention and Contrastive Pretraining

Improving Counterfactual Story Rewriting with Policy-Gradient Approaches

Fluent vs. Non-fluent Data Augmentation in Knowledge Distillation for Machine Translation for Low-Resource Languages