Hybrid Architecture for Automatic Video-Based Fall Detection Using YOLOv11, MediaPipe Pose, and LSTM Networks

Juan M. Triviño
Andrés F. Lasso
Carlos M. Paredes
Victor M. Peñeñory

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Falls represent one of the leading causes of injury and loss of autonomy among older adults worldwide. This work proposes a lightweight hybrid deep learning architecture for automatic fall detection, combining person detection with YOLOv11m, human pose estimation with MediaPipe, and temporal analysis using a long short-term memory network. Evaluated on the Le2i dataset, the model classified frames into normal activity, fall in progress, and person on the floor, achieving an overall accuracy of 99.23% and a weighted F1-score of 97.38%. The system matches or outperforms recent hybrid and transformer-based approaches while requiring lower computational resources, demonstrating its suitability for real-time embedded or home monitoring applications. Future work will focus on performance in uncontrolled environments and optimization for edge computing.

Version published to 10.21203/rs.3.rs-8273858/v1 on Research Square
Dec 8, 2025

Deep Learning Based Surveillance System for Fall Detection in Elderly Population

This article has 2 authors:
1. VANSHIKA PATEL
2. VINITA SHARMA
This article has no evaluationsLatest version Dec 16, 2025
Taekwondo training motion capture technology based on improved Transformer-GCN

This article has 1 author:
1. zhiyong li
This article has no evaluationsLatest version Jan 23, 2026
SPARK: Sparse-Perception Action Recognition with Keyframes for Quadruped Robots

This article has 2 authors:
1. Sehun Park
2. Andrew Jaeyong Choi
This article has no evaluationsLatest version Dec 10, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Deep Learning Based Surveillance System for Fall Detection in Elderly Population

Taekwondo training motion capture technology based on improved Transformer-GCN

SPARK: Sparse-Perception Action Recognition with Keyframes for Quadruped Robots