Hybrid Architecture for Automatic Video-Based Fall Detection Using YOLOv11, MediaPipe Pose, and LSTM Networks

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Falls represent one of the leading causes of injury and loss of autonomy among older adults worldwide. This work proposes a lightweight hybrid deep learning architecture for automatic fall detection, combining person detection with YOLOv11m, human pose estimation with MediaPipe, and temporal analysis using a long short-term memory network. Evaluated on the Le2i dataset, the model classified frames into normal activity, fall in progress, and person on the floor, achieving an overall accuracy of 99.23% and a weighted F1-score of 97.38%. The system matches or outperforms recent hybrid and transformer-based approaches while requiring lower computational resources, demonstrating its suitability for real-time embedded or home monitoring applications. Future work will focus on performance in uncontrolled environments and optimization for edge computing.

Article activity feed