DRaFT-Q: Dynamic Rank-Aware Fine-Tuning under Quantization for Efficient and Reward-Sensitive Adaptation of Language Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large Language Models (LLMs) demonstrate remarkable capabilities in language understanding, reasoning, and cross-domain generalization. However, their immense scale makes full fine-tuning computationally and memory-intensive. Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA and QLoRA address this by introducing low-rank adapters while keeping most of the model frozen. Yet, these methods typically use fixed adapter ranks, which do not adapt to varying layer complexity or evolving training dynamics—resulting in suboptimal convergence and inefficient capacity utilization. We introduce DRaFT-Q (Dynamic Rank-Aware Fine-Tuning under Quantization), a novel PEFT strategy that combines dynamic LoRA rank allocation with token-level reward-weighted loss. DRaFTQ adjusts adapter rank in real-time based on lightweight curvature signals from gradient statistics and prioritizes semantically important tokens during learning using external or task-derived reward weights. This dual adaptation improves both parameter usage and training focus under strict memory constraints. We evaluate DRaFT-Q on reasoning and QA benchmarks including CommonsenseQA, OpenBookQA, GSM8K, OpenAssistant, and OpenHermes using LLaMA-2-7B and 13B models under 4-bit quantization. Experiments on constrained GPUs (T4 for 7B; L40s for 13B) show DRaFT-Q achieves better loss convergence, improved generalization, and competitive accuracy compared to LoRA, QLoRA, and AdaLoRA—while maintaining comparable resource usage. Our findings highlight DRaFT-Q’s effectiveness for dynamic, reward-aware fine-tuning of quantized LLMs at scale.