STGSFormer: A 3d Human Pose Estimation Model That Integrates GCN and Self-attention in the Spatio-temporal Domain

Fanjun Su
Jinyue Wang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Most methods that combine Transformer and graph convolutional network (GCN) for 3D human pose estimation (HPE) often overlook feature disparity during the fusion process. Additionally, GCNs are typically limited to capturing the spatial relationships between local joints, and cannot fully capture the temporal dependencies between adjacent frames. To address these problems, we propose a network model that integrates GCN and self-attention of Transformer (STGSFormer). STGSFormer integrates the global dependencies captured by self-attention between different joints or frames into GCN, enabling GCN to consider global relations while processing local information. This effectively alleviates the issue of feature discrepancies. Furthermore, we propose a dynamic temporal GCN block (DFGCN), which integrates temporal distance information to enhance the feature representation capability of the temporal GCN. STGSFormer is evaluated on the Human3.6M and MPI-INF-3DHP datasets using the MPJPE metric, achieving results of 40.8 mm and 17.3 mm, respectively. These results demonstrate the superior performance of the proposed model.

Version published to 10.21203/rs.3.rs-7065762/v1 on Research Square
Aug 21, 2025

A Spatiotemporal Bidirectional Mamba Network with Global–Local Skeletal Enhancement for 3D Human Pose Estimation

This article has 5 authors:
1. Chuhan Wu
2. Zan Wang
3. Guixian Zhou
4. Jiahao Hua
5. Lianke Shi
This article has no evaluationsLatest version Sep 4, 2025
Diving Performance Analysis with 3D Motion Knowledge Hypergraphs

This article has 4 authors:
1. Jingbo Wang
2. Yifan Xie
3. Yitao Xie
4. Hongyu Xiao
This article has no evaluationsLatest version Sep 8, 2025
UREPTrack: Unified RGB-Event Visual Tracking via PoolFormer Backbone

This article has 1 author:
1. Min Lu
This article has no evaluationsLatest version Sep 24, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Spatiotemporal Bidirectional Mamba Network with Global–Local Skeletal Enhancement for 3D Human Pose Estimation

Diving Performance Analysis with 3D Motion Knowledge Hypergraphs

UREPTrack: Unified RGB-Event Visual Tracking via PoolFormer Backbone