Action Recognition in Videos through a Transfer-Learning-Based Technique

Elizabeth López-Lozada
Humberto Sossa
Elsa Rubio-Espino
Jesús Yaljá Montiel-Pérez

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In computer vision, human action recognition is a hot topic, popularized by the development of deep learning. Deep learning models typically accept video input without prior processing and train them to achieve recognition. However, conducting preliminary motion analysis can be beneficial in directing the model training to prioritize the motion of individuals with less priority for the environment in which the action occurs. This paper puts forth a novel methodology for human action recognition based on motion information that employs transfer-learning techniques. The proposed method comprises four stages: (1) human detection and tracking, (2) motion estimation, (3) feature extraction, and (4) action recognition using a two-stream model. In order to develop this work, a customized dataset was utilized, comprising videos of diverse actions (e.g., walking, running, cycling, drinking, and falling) extracted from multiple public sources and websites, including Pexels and MixKit. This realistic and diverse dataset allowed for a comprehensive evaluation of the proposed method, demonstrating its effectiveness in different scenarios and conditions. Furthermore, the performance of seven pre-trained models for feature extraction was evaluated. The models analyzed were Inception-v3, MobileNet-v2, MobileNet-v3-L, VGG-16, VGG-19, Xception, and ConvNeXt-L. The results demonstrated that the ConvNeXt-L model yielded the most optimal outcomes. Furthermore, using pre-trained models for feature extraction facilitated the training process on a personal computer with a single graphics processing unit, achieving an accuracy of 94.9%. The experimental findings and outcomes suggest that integrating motion information enhances action recognition performance.

Version published to 10.3390/math12203245
Oct 17, 2024
Version published to 10.20944/preprints202406.1670.v1
Jun 25, 2024

Two-Stage Fine-Tuning of Large Vision-Language Models with Hierarchical Prompting for Few-Shot Object Detection in Remote Sensing Images

This article has 7 authors:
1. Yongqi Shi
2. Ruopeng Yang
3. Changsheng Yin
4. Yiwei Lu
5. Bo Huang
6. Yu Tao
7. Yihao Zhong
This article has no evaluationsLatest version Jan 14, 2026
From Scratch to Fine Tuning: Comparing Transfer Learning and CNN Training Strategies on Five Bangladesh-Centric Datasets

This article has 4 authors:
1. Minhaz Kamal
2. Md. Mushfiqul Haque
3. Rafid Nahiyan Farabi
4. Muhammad Ibrahim
This article has no evaluationsLatest version Jan 9, 2026
SPARK: Sparse-Perception Action Recognition with Keyframes for Quadruped Robots

This article has 2 authors:
1. Sehun Park
2. Andrew Jaeyong Choi
This article has no evaluationsLatest version Dec 10, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Two-Stage Fine-Tuning of Large Vision-Language Models with Hierarchical Prompting for Few-Shot Object Detection in Remote Sensing Images

From Scratch to Fine Tuning: Comparing Transfer Learning and CNN Training Strategies on Five Bangladesh-Centric Datasets

SPARK: Sparse-Perception Action Recognition with Keyframes for Quadruped Robots