CTAMnet: An efficient video recognition method on edge

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

With the application of video data on the edge side, the application of efficient video recognition techniques on edge devices becomes particularly important. However, existing methods face the challenges of limited computational resources on edge devices. Therefore, this paper proposes a CTAMnet (CNN-Transformer Attention Mobile Network), which aims to achieve efficient and accurate video recognition on edge devices. CTAMnet improves the CNN function by introducing Temporal Shift Module (TSM) and Dynamic Tanh (DyT) to keep the original parameters of the Improved modeling. An agent attention mechanism as well as the Enhancement-Compression Module are introduced to improve the classification performance by hybridizing the model(TSFF) to capture both the visual and temporal dependencies of the video frames and introducing an adaptive focus loss function. We conducted experiments on the UCF-101 and Kinetics-400 datasets, and the results show that CTAMnet significantly improves the accuracy of video recognition while maintaining a low computational overhead. In addition, the design of CTAMnet enables it to be easily deployed on resource-constrained edge devices, providing an effective solution for real-time video recognition tasks.

Article activity feed