Human Action Recognition Using YOLOv11 Ultralytics: A Comprehensive Study for Real-Time Applications

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Human action recognition (HAR) is a pivotal task in computer vision, with applications in surveillance, healthcare, robotics, and human-computer interaction. This study presents a novel framework for HAR using the YOLOv11 model by Ultralyt ics, a state-of-the-art object detection architecture optimized for real-time performance. We trained and evaluated the model on a custom dataset comprising 18 distinct human actions, captured in indoor environments using fisheye cameras. The actions range from everyday activities (e.g., walking, sitting) to specialized tasks (e.g., patient on stretcher, patient on wheelchair). Our results show that YOLOv11 achieves a mean Average Precision (mAP@0.5) of 0.401, with exceptional performance on actions like ”cleaning” (mAP@0.5: 0.760), ”searching” (mAP@0.5: 0.695), and ”patient on wheelchair” (mAP@0.5: 0.995). We provide an in-depth analysis of the model’s training metrics, bounding box distributions, precision-recall curves, F1-confidence curves, recall-confidence curves, and confusion matrices. Additionally, we present extensive qualitative results to demonstrate the model’s robustness in real-world scenarios. A comparison with existing methods, such as two-stream CNNs and Transformer-based models, highlights YOLOv11’s superior balance of accuracy and speed, making it a promising solution for real-time HAR applications. This study also discusses the model’s limitations and outlines directions for future research, paving the way for enhanced action recognition systems.

Article activity feed