SkeleRGB-Net: Towards Real-Time Behavior Recognition in Rail Scenes via Adaptive Skeletal-RGB Stream Fusion

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper proposes a multimodal network SkeleRGB-Net that combines skeleton and RGB for identifying abnormal behaviors in subway security scenarios. Firstly, the YOLO-Pose framework was combined to construct a multimodal feature fusion decision-making mechanism. Secondly, the depth information module performs geometric correction on key points through three-dimensional spatial coordinates, effectively alleviating errors caused by image distortion. The Lightweight Feature Extractor module reduces the computational complexity of the model through partial convolution and layered dynamic ratio strategies.The experimental results on the hand held screen door dataset (HSD), the Stanford 40 and PPMI datasets show that SkeleRGB-Net can effectively integrate the two modalities, processing visual and skeletal data in parallel while reaching a high inference speed of 117 FPS, which underscores its strong real-time capability.

Article activity feed