Optimal Complexity in Lightweight Vision Transformers: A Trade-off Analysis between Representational Power and Optimization Efficiency

Yunan Zhang
Jingjing Fan
Jianguang Zhao

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The deployment of deep learning models on resource-constrained edge devices necessitates a critical balance between performance and complexity. This study systematically challenges the prevailing assumption that enhancing lightweight vision transformers with sophisticated modules invariably improves performance. By investigating the impact of structural enhancements on the state-of-the-art lightweight Vision Transformer, RepViT-M0.9, our experiments on ImageNet-1K reveal that increasing structural complexity can significantly degrade accuracy and parameter efficiency. Visualizations and feature space analysis suggest that excessive complexity within a lightweight model impairs feature representations and introduces optimization challenges. We propose the Representation-Optimization Trade-off Theory, which models performance as a balance between representational power and optimization cost. Our findings demonstrate that an optimal complexity level exists for lightweight models, beyond which performance deteriorates. This work highlights the importance of structural simplicity and parameter efficiency in developing effective AI solutions for edge devices. The source code and pre-trained models are available at: https://github.com/niyaobuyaochibl/ACR-RepViT with DOI:10.5281/zenodo.16959886.

Version published to 10.21203/rs.3.rs-7471450/v1 on Research Square
Sep 5, 2025

ViT-CAAC: Contribution-Aware Adaptive Compression Framework for Vision Transformers

This article has 6 authors:
1. YU ZHANG
2. Shujun Peng
3. Yuheng Xiao
4. Xinhan Lin
5. Yang Hu
6. Shouyi Yin
This article has no evaluationsLatest version Sep 12, 2025
Efficient Model Pruning for Large-Scale Deep Learning Models: Enhancing Performance and Reducing Computational Overhead

This article has 1 author:
1. Dinesh Kumar Koilada
This article has no evaluationsLatest version Sep 3, 2025
Tri-Module Deep DFT Architecture with Physical Regularization, Task Coupling, and Compression-Based Transferability

This article has 1 author:
1. Abdelaali Mahrouk
This article has no evaluationsLatest version Aug 22, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

ViT-CAAC: Contribution-Aware Adaptive Compression Framework for Vision Transformers

Efficient Model Pruning for Large-Scale Deep Learning Models: Enhancing Performance and Reducing Computational Overhead

Tri-Module Deep DFT Architecture with Physical Regularization, Task Coupling, and Compression-Based Transferability