Self-Attention Factor-Tuning for Parameter Efficient Fine-Tuning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Transformers have revolutionized the fields of Natural Language Processing and Computer Vision - a result of their ability to capture long-range dependencies with their key innovation: the attention mechanism. Despite the success of these models, their growing complexity has led to an ever-increasing need for processing power, making their practical applications less feasible. In recent years, tensor decomposition-based parameter-efficient fine-tuning techniques have emerged as a promising solution to the computational bottleneck. In this research, we investigate the use of a modified version of Factor Tuning that lessens inter-layer associations that the original Factor Tuning creates and focuses exclusively on attention mechanisms. We refer to this method as Self-Attention Factor-Tuning. To evaluate the effectiveness of our approach, we conduct experiments with Vision Transformers using all 19 datasets from the VTAB-1k benchmark for image classification. The results demonstrate that the proposed framework effectively reduces the number of parameters required to fine-tune a transformer, achieving new state-of-the-art performance on three of the 19 datasets in the benchmark and outperforming the original Factor-Tuning paradigm as well as various other competitive techniques, whilst using significantly fewer parameters.