Efficient Attention Vision Transformers for Monocular Depth Estimation on Resource-Limited Hardware

Claudio Schiavella
Lorenzo Cirillo
Lorenzo Papa
Paolo Russo
Irene Amerini

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Vision Transformers show important results in the current Deep Learning technological landscape, being able to approach complex and dense tasks, for instance, Monocular Depth Estimation. However, in the transformer architecture, the attention module introduces a quadratic cost concerning the processed tokens. In dense Monocular Depth Estimation tasks, the inherently high computational complexity results in slow inference and poses significant challenges, particularly in resource-constrained onboard applications. To mitigate this issue, efficient attention modules have been developed. In this paper, we leverage these techniques to reduce the computational cost of networks designed for Monocular Depth Estimation, to reach an optimal trade-off between the quality of the results and inference speed. More specifically, optimization has been applied not only to the entire network but also independently to the encoder and decoder to assess the model's sensitivity to these modifications. Additionally, this paper introduces the use of the Pareto Frontier as an analytic method to get the optimal trade-off between the two objectives of quality and inference time. The results indicate that various optimised networks achieve performance comparable to, and in some cases surpass, their respective baselines, while significantly enhancing inference speed.

Version published to 10.21203/rs.3.rs-6328112/v1 on Research Square
Jun 3, 2025

PUNet: A Lightweight Parallel U-Net Architecture Integrating Mamba-CNN for High-Precision Image Segmentation

This article has 5 authors:
1. Zhaoyan Xie
2. Xiaowei Li
3. Hongyao Ma
4. Sihao Wu
5. Dayou Cui
This article has no evaluationsLatest version May 29, 2025
Retinex-Inspired Dual Attention Transformer for Low-Iight Enhancement

This article has 3 authors:
1. Yunxue Shao
2. Yijin Diao
3. Lingfeng Wang
This article has no evaluationsLatest version Jun 2, 2025
Fourier Feature Pyramid Network for Improved Small Object Detection in Complex Traffic Scenes

This article has 3 authors:
1. Jianquan Ouyang
2. Jiaxiong Lu
3. Lingtao Zeng
This article has no evaluationsLatest version May 30, 2025

Listed in

Abstract

Article activity feed

Related articles

PUNet: A Lightweight Parallel U-Net Architecture Integrating Mamba-CNN for High-Precision Image Segmentation

Retinex-Inspired Dual Attention Transformer for Low-Iight Enhancement

Fourier Feature Pyramid Network for Improved Small Object Detection in Complex Traffic Scenes