GCN-Transformer: Multi-task Graph Convolutional Network and Transformer for Multi-Person Pose Forecasting
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Multi-person pose forecasting involves predicting the future body poses of multiple individuals over time, involving complex movement dynamics and interaction dependencies. Its relevance spans various fields, including computer vision, robotics, human-computer interaction, and surveillance. This paper introduces GCN-Transformer, a novel model for multi-person pose forecasting that leverages the integration of Graph Convolutional Network and Transformer architectures. We integrated novel loss terms during the training phase to enable the model to learn both interaction dependencies and the trajectories of multiple joints simultaneously. Additionally, we propose a novel pose forecasting evaluation metric called Final Joint Position and Trajectory Error (FJPTE), which assesses both local movement dynamics and global movement errors by considering the final position and the trajectory leading up to it, providing a more comprehensive assessment of movement dynamics. Comprehensive evaluations on the SoMoF Benchmark and ExPI datasets demonstrate that the proposed GCN-Transformer model consistently outperforms existing state-of-the-art (SOTA) models according to the VIM and MPJPE metrics. Specifically, GCN-Transformer shows a 5% improvement over the closest SOTA model on the SoMoF Benchmark’s MPJPE metric and a 2.6% improvement over the closest SOTA model on the ExPI dataset MPJPE metric. Unlike other models whose performance fluctuates across datasets, GCN-Transformer performs consistently, proving its robustness in multi-person pose forecasting and providing an excellent foundation for the application of GCN-Transformer in different domains. The code is available at https://github.com/RomeoSajina/GCN-Transformer.