SkeleRGB-Net: Towards Real-Time Behavior Recognition in Rail Scenes via Adaptive Skeletal-RGB Stream Fusion

Xinlei Zhao
Dongyang Li
Peng Zhang
Weimin Lei
Xiangming Xue

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper proposes a multimodal network SkeleRGB-Net that combines skeleton and RGB for identifying abnormal behaviors in subway security scenarios. Firstly, the YOLO-Pose framework was combined to construct a multimodal feature fusion decision-making mechanism. Secondly, the depth information module performs geometric correction on key points through three-dimensional spatial coordinates, effectively alleviating errors caused by image distortion. The Lightweight Feature Extractor module reduces the computational complexity of the model through partial convolution and layered dynamic ratio strategies.The experimental results on the hand held screen door dataset (HSD), the Stanford 40 and PPMI datasets show that SkeleRGB-Net can effectively integrate the two modalities, processing visual and skeletal data in parallel while reaching a high inference speed of 117 FPS, which underscores its strong real-time capability.

Version published to 10.21203/rs.3.rs-9037187/v1 on Research Square
Mar 16, 2026

CFA-DeepLabV3+: Cross-level Fusion and Attention Network for Lightweight Road Segmentation

This article has 6 authors:
1. Xin Zhang
2. Yan Li
3. Zexi Hua
4. XiangZhen Zhou
5. YuGe Pan
6. Hui Qiao
This article has no evaluationsLatest version Apr 8, 2026
MFFP-Net: Multi-directional Feature Fusion and Position-Aware Network

This article has 4 authors:
1. Yazhong Si
2. Jingyu Chen
3. Hongxu Li
4. Chen Li
This article has no evaluationsLatest version Mar 9, 2026
CMAFNet: Efficient Cross-Modal Alignment and Fusion for Real-Time RGB–Infrared Object Detection in Autonomous Driving

This article has 3 authors:
1. Zi-Han Huang
2. Chen-Wei Liang
3. Mu-Jiang-Shan Wang
This article has no evaluationsLatest version Mar 5, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

CFA-DeepLabV3+: Cross-level Fusion and Attention Network for Lightweight Road Segmentation

MFFP-Net: Multi-directional Feature Fusion and Position-Aware Network

CMAFNet: Efficient Cross-Modal Alignment and Fusion for Real-Time RGB–Infrared Object Detection in Autonomous Driving