Efficient Adaptation of Pre-trained Models: A Survey of PEFT for Language, Vision, and Multimodal Learning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The rapid scaling of pre-trained foundation models in natural language processing (NLP), computer vision (CV), and multimodal learning has led to growing interest in methods that can adapt these large models efficiently without incurring the full computational or storage costs of traditional fine-tuning. Parameter-Efficient Fine-Tuning (PEFT) methods address this challenge by modifying or introducing a small subset of learnable parameters while keeping the majority of the model frozen. In this survey, we present a comprehensive and systematic overview of the landscape of PEFT approaches. We categorize the main families of PEFT methods—including prompt tuning, adapter tuning, low-rank adaptation (e.g., LoRA), BitFit, and sparse updating—providing unified mathematical formulations, detailed comparative analyses, and extensive discussion of their theoretical underpinnings and empirical properties. We also explore implementation considerations, evaluation benchmarks, and real-world applications across language, vision, and multimodal domains. Finally, we highlight open challenges, interpretability gaps, and future research directions in this rapidly evolving field. Our goal is to serve as a foundation for researchers and practitioners seeking to understand, apply, or advance the state of the art in parameter-efficient adaptation of large-scale models.

Article activity feed