Multimodal Human Behavior Recognition Based on Contextual Semantics and Skeleton

Cheng Liu
Kaile Ni
Xirui Wang
Jiaqing Fei
Kecheng Song
Jingqian Gu
Zehao Hu
Lin Wang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Skeleton-based human behavior recognition has been widely studied due to its efficiency and robustness to complex backgrounds. While skeleton data accurately captures the dynamic changes in human posture, it overly relies on the quality of skeleton data and lacks interaction with the environment. In cases where skeleton keypoints are missing, using only skeleton data for behavior recognition results in significantly reduced performance. To address these issues, this paper proposes a method for behavior recognition that combines contextual semantics with skeleton detection. It fully considers the correlation between human skeletons, objects, and the interaction between humans and objects. When recognizing behaviors from human skeletons, this method simultaneously identifies objects near human skeletons and performs multimodal fusion after forming semantic information. It utilizes transformer-based semantic similarity calculation to determine the possible correlation between behaviors and targets and finally combines the scores from two stages to obtain the final prediction results. Experimental results show that on the UCF101 dataset, which is closer to real-world scenarios, the proposed method achieves an 8.4% improvement in accuracy compared to PoseConv3D.

Version published to 10.21203/rs.3.rs-4722459/v1 on Research Square
Aug 6, 2024

CDGaitFusion: a multimodal gait recognition network based on the fusion of commonality patterns and differential features

This article has 4 authors:
1. Siwei Wei
2. Qi Shi
3. Feifei Wei
4. Chunzhi Wang
This article has no evaluationsLatest version Dec 17, 2025
SPARK: Sparse-Perception Action Recognition with Keyframes for Quadruped Robots

This article has 2 authors:
1. Sehun Park
2. Andrew Jaeyong Choi
This article has no evaluationsLatest version Dec 10, 2025
A 3D Convolutional Neural Network Model for Figure Skating Action Capture and Pose Recognition Based on Spatiotemporal Geometric Theory

This article has 1 author:
1. Yueru Li
This article has no evaluationsLatest version Dec 19, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

CDGaitFusion: a multimodal gait recognition network based on the fusion of commonality patterns and differential features

SPARK: Sparse-Perception Action Recognition with Keyframes for Quadruped Robots

A 3D Convolutional Neural Network Model for Figure Skating Action Capture and Pose Recognition Based on Spatiotemporal Geometric Theory