Motion-driven Adaptive Frame Selection Strategy for Video Action Recognition
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Frame selection is a fundamental issue in video action recognition. It aims to minimize temporal redundancy and reduce computational cost. Current frame sampling strategies either rely on uniform sampling based on motion, lacking emphasis on discriminative frames, or employ complex learning models or additional modal information, compromising generalizability. To address these challenges, this paper presents an adaptive frame selection strategy. It filters redundant frames through motion information and models relationships between each frame and others, thereby predicting the significance of each frame. This strategy combines the advantages of motion prior information and supervised learning. During training, frame importance-related constraints are integrated, guiding frames selection with strong discriminative features as inputs for the action recognition network. This frame selection method is integrated with backbone network structures such as TDN, GCTDN, AIM, and tested on three action datasets, Diving-48, UCF101 and HMDB51.The improvement on action recognition achieved is 4.4% on the Diving-48 dataset, 1.9% on the UCF101 dataset and 2.3% on HMDB51 dataset. Experimental results demonstrate that our selection strategy can be integrated with state-of-the-art action recognition models, leading to improved recognition performance.