Hybrid Architecture for Automatic Video-Based Fall Detection Using YOLOv11, MediaPipe Pose, and LSTM Networks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Falls represent one of the leading causes of injury and loss of autonomy among older adults worldwide. This work proposes a lightweight hybrid deep learning architecture for automatic fall detection, combining person detection with YOLOv11m, human pose estimation with MediaPipe, and temporal analysis using a long short-term memory network. Evaluated on the Le2i dataset, the model classified frames into normal activity, fall in progress, and person on the floor, achieving an overall accuracy of 99.23% and a weighted F1-score of 97.38%. The system matches or outperforms recent hybrid and transformer-based approaches while requiring lower computational resources, demonstrating its suitability for real-time embedded or home monitoring applications. Future work will focus on performance in uncontrolled environments and optimization for edge computing.