Efficient Large Language Model Fine-Tuning with Joint Structural Pruning and Parameter Sharing

Rui Wang
Yumin Chen
Mengmeng Liu
Guiran Liu
Binrong Zhu
Wuyang Zhang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper addresses the challenges of high computational cost and severe parameter redundancy in the fine-tuning of large language models. It proposes an efficient fine-tuning algorithm that integrates structural pruning with parameter sharing. The method operates from both the architectural and optimization perspectives. It prunes redundant connections dynamically while keeping the core model frozen and introduces task-conditioned cross-layer sharing modules to enhance representation power and parameter efficiency. A pruning residual compensation mechanism is designed to preserve semantic coherence, and a conditional sharing mapping is constructed to improve task-level consistency. The training objective jointly optimizes task loss, sparsity regularization, and inter-layer consistency constraints, achieving unified parameter compression and semantic retention. The proposed method is systematically evaluated using perplexity, accuracy, and inference speed-up across different pruning rates, learning rates, input lengths, and data distribution settings. Experimental results show that the algorithm consistently outperforms mainstream fine-tuning techniques across multiple dimensions. It achieves joint optimization of accuracy and efficiency with minimal parameter tuning, making it well-suited for large language model deployment and transfer learning across diverse scenarios.

Version published to 10.20944/preprints202509.1618.v1
Sep 18, 2025

DRaFT-Q: Dynamic Rank-Aware Fine-Tuning under Quantization for Efficient and Reward-Sensitive Adaptation of Language Models

This article has 1 author:
1. Adharapurapu V S M Ashok Kumar
This article has no evaluationsLatest version Sep 3, 2025
Efficient Model Pruning for Large-Scale Deep Learning Models: Enhancing Performance and Reducing Computational Overhead

This article has 1 author:
1. Dinesh Kumar Koilada
This article has no evaluationsLatest version Sep 3, 2025
Tri-Module Deep DFT Architecture with Physical Regularization, Task Coupling, and Compression-Based Transferability

This article has 1 author:
1. Abdelaali Mahrouk
This article has no evaluationsLatest version Aug 22, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DRaFT-Q: Dynamic Rank-Aware Fine-Tuning under Quantization for Efficient and Reward-Sensitive Adaptation of Language Models

Efficient Model Pruning for Large-Scale Deep Learning Models: Enhancing Performance and Reducing Computational Overhead

Tri-Module Deep DFT Architecture with Physical Regularization, Task Coupling, and Compression-Based Transferability