A Lightweight Adapter for Efficient Fine-Tuning in Computer Vision

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Pretrained vision backbones are becoming increasingly large, making full fine-tuning expensive in both training time and storage, especially when adapting a single backbone to many tasks or data domains. In the setting of Parameter-Efficient Fine-Tuning (PEFT), we propose DT1D-Adapter , a lightweight adapter that can be plugged into both convolutional networks (ConvNets) and Transformer-based models by operating on spatial features of size \((H\times W)\). DT1D-Adapter performs axial filtering via depthwise 1D convolution along the height and/or width, where dilation enlarges the receptive field with minimal parameter increase. To control the parameter budget, the filters are parameterized by symmetric, group-shared coefficients, optionally complemented with lightweight channel mixing using grouped \((1\times 1)\) projections and a small-initialized scalar residual gate to stabilize optimization under limited trainable parameters. Experiments on multiple image classification datasets show that DT1D-Adapter provides a strong accuracy--parameter trade-off, remaining competitive with common PEFT baselines such as SSF, BitFit, and VPT, and demonstrating notable efficiency compared with convolution-based adapters (e.g., Conv-Adapter) and residual-branch variants (e.g., Residual Adapters).We further report a simple video-stream inference benchmark, indicating that DT1D-Adapter remains compatible with real-time deployment on common GPU platforms.

Article activity feed