CTAMnet: An efficient video recognition method on edge

BiaoXin Li
CaiMing Zheng
ZhangLing Wang
YanZhong Zhou
WangPeng Zhang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

With the application of video data on the edge side, the application of efficient video recognition techniques on edge devices becomes particularly important. However, existing methods face the challenges of limited computational resources on edge devices. Therefore, this paper proposes a CTAMnet (CNN-Transformer Attention Mobile Network), which aims to achieve efficient and accurate video recognition on edge devices. CTAMnet improves the CNN function by introducing Temporal Shift Module (TSM) and Dynamic Tanh (DyT) to keep the original parameters of the Improved modeling. An agent attention mechanism as well as the Enhancement-Compression Module are introduced to improve the classification performance by hybridizing the model(TSFF) to capture both the visual and temporal dependencies of the video frames and introducing an adaptive focus loss function. We conducted experiments on the UCF-101 and Kinetics-400 datasets, and the results show that CTAMnet significantly improves the accuracy of video recognition while maintaining a low computational overhead. In addition, the design of CTAMnet enables it to be easily deployed on resource-constrained edge devices, providing an effective solution for real-time video recognition tasks.

Version published to 10.21203/rs.3.rs-7143653/v1 on Research Square
Aug 7, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed