ViT-CAAC: Contribution-Aware Adaptive Compression Framework for Vision Transformers

YU ZHANG
Shujun Peng
Yuheng Xiao
Xinhan Lin
Yang Hu
Shouyi Yin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The Vision Transformer (ViT) model has emerged as a powerful architecture for visual tasks by enabling the capture of long-range dependencies within images, demonstrating superior performance across a variety of applications. However, the large parameter count, along with high computational and memory demands of ViTs pose significant challenges. This paper introduces ViT-CAAC (Contribution-Aware Adaptive Compression Framework), a novel, multi-faceted compression framework designed to optimize ViTs. Our framework integrates block-level knowledge distillation, layer-wise quantization with precision control across hierarchical layers, and adaptive sparsity, creating a cohesive approach that substantially reduces model size while preserving performance. Through rigorous experimentation on benchmark datasets, we demonstrate that our framework achieves over 76% reduction in model size with minimal accuracy degradation (less than 0.4% Top-1 accuracy loss). This work establishes a novel concept for deploying high-performance vision models on resource-limited devices, with implications for applications in autonomous systems, IoT, and real-time vision processing.

Version published to 10.21203/rs.3.rs-7464053/v1 on Research Square
Sep 12, 2025

Optimal Complexity in Lightweight Vision Transformers: A Trade-off Analysis between Representational Power and Optimization Efficiency

This article has 3 authors:
1. Yunan Zhang
2. Jingjing Fan
3. Jianguang Zhao
This article has no evaluationsLatest version Sep 5, 2025
Efficient Unsupervised Domain Adaptation via Self-Supervised Vision Transformer and Synergistic Cross-Domain Alignment

This article has 4 authors:
1. Ali Abedi
2. Q. M. Jonathan Wu
3. Ning Zhang
4. Farhad Pourpanah
This article has no evaluationsLatest version Oct 6, 2025
Hierarchical Prompt Engineering for Remote Sensing Scene Understanding with Large Vision-Language Models

This article has 2 authors:
1. Tianyang Chen
2. Jianliang Ai
This article has no evaluationsLatest version Oct 16, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Optimal Complexity in Lightweight Vision Transformers: A Trade-off Analysis between Representational Power and Optimization Efficiency

Efficient Unsupervised Domain Adaptation via Self-Supervised Vision Transformer and Synergistic Cross-Domain Alignment

Hierarchical Prompt Engineering for Remote Sensing Scene Understanding with Large Vision-Language Models