DRaFT-Q: Dynamic Rank-Aware Fine-Tuning under Quantization for Efficient and Reward-Sensitive Adaptation of Language Models

Adharapurapu V S M Ashok Kumar

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large Language Models (LLMs) demonstrate remarkable capabilities in language understanding, reasoning, and cross-domain generalization. However, their immense scale makes full fine-tuning computationally and memory-intensive. Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA and QLoRA address this by introducing low-rank adapters while keeping most of the model frozen. Yet, these methods typically use fixed adapter ranks, which do not adapt to varying layer complexity or evolving training dynamics—resulting in suboptimal convergence and inefficient capacity utilization. We introduce DRaFT-Q (Dynamic Rank-Aware Fine-Tuning under Quantization), a novel PEFT strategy that combines dynamic LoRA rank allocation with token-level reward-weighted loss. DRaFTQ adjusts adapter rank in real-time based on lightweight curvature signals from gradient statistics and prioritizes semantically important tokens during learning using external or task-derived reward weights. This dual adaptation improves both parameter usage and training focus under strict memory constraints. We evaluate DRaFT-Q on reasoning and QA benchmarks including CommonsenseQA, OpenBookQA, GSM8K, OpenAssistant, and OpenHermes using LLaMA-2-7B and 13B models under 4-bit quantization. Experiments on constrained GPUs (T4 for 7B; L40s for 13B) show DRaFT-Q achieves better loss convergence, improved generalization, and competitive accuracy compared to LoRA, QLoRA, and AdaLoRA—while maintaining comparable resource usage. Our findings highlight DRaFT-Q’s effectiveness for dynamic, reward-aware fine-tuning of quantized LLMs at scale.

Version published to 10.21203/rs.3.rs-7491496/v1 on Research Square
Sep 3, 2025

Efficient Large Language Model Fine-Tuning with Joint Structural Pruning and Parameter Sharing

This article has 6 authors:
1. Rui Wang
2. Yumin Chen
3. Mengmeng Liu
4. Guiran Liu
5. Binrong Zhu
6. Wuyang Zhang
This article has no evaluationsLatest version Sep 18, 2025
Assessing the Applicability of Fine-Tuning LargeLanguage Models for Designing and Deploying 24/7 Context-Aware Multichannel CRM

This article has 3 authors:
1. Naoudouwel Fulbert
2. Maria Vinitha
3. Kanagasabai Thiruthanigesan
This article has no evaluationsLatest version Sep 30, 2025
Improving Large Language Models with Concept-Aware Fine-Tuning

This article has 5 authors:
1. Dacheng Tao
2. Michael Chen
3. Xikun ZHANG
4. Jiaxing Huang
5. Yingjie Wang
This article has no evaluationsLatest version Oct 1, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Efficient Large Language Model Fine-Tuning with Joint Structural Pruning and Parameter Sharing

Assessing the Applicability of Fine-Tuning LargeLanguage Models for Designing and Deploying 24/7 Context-Aware Multichannel CRM

Improving Large Language Models with Concept-Aware Fine-Tuning