Unified Instruction Encoding and Gradient Coordination for Multi-Task Language Models

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper conducts an in-depth study on the theoretical foundations and optimization mechanisms of instruction tuning for multi-task generalization. We propose a unified, parameter-efficient tuning framework that integrates instruction embedding modeling, task similarity regularization, and gradient alignment. The goal is to enhance the generalization and robustness of large language models under complex task combinations. Current instruction tuning methods often suffer from representation shift, objective conflict, and gradient interference when handling heterogeneous tasks. To address these issues, we propose a systematic solution from both structural design and optimization perspectives. Methodologically, we introduce semantically aligned instruction encodings to improve representation consistency across tasks. During optimization, we apply gradient projection to reduce inter-task update conflicts and adopt a dynamic weighting strategy based on gradient variation to enhance training stability and coordination. On the theoretical side, we construct an upper bound on generalization error based on Rademacher complexity and KL divergence of distributional shifts. This provides a formal characterization of the performance boundaries in multi-task instruction tuning. We conduct multiple experiments on the Super-NaturalInstructions dataset. The evaluation covers various aspects, including different instruction formulations, generalization to unseen tasks, and robustness under task combinations. Results show that the proposed method outperforms baselines on key metrics. The findings confirm its effectiveness in improving generalization under high task heterogeneity and reducing the risk of conflicts during cross-task learning.

Article activity feed