MAQT:Multi-scale Attention and Query-Optimized Transformer for End-to-End Pose Estimation

Hong Liang
Cuiping Wang
Mingwen Shao
Qian Zhang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Researchers are rapidly turning their focus to human pose estimation as a crucial area of computer vision. In light of the shortcomings of existing Transformer-based pose estimate methods when handling localized features, this work presents MAQT, an enhanced end-to-end method aimed at precise multi-human body pose estimation.To improve the localization of keypoints that are sensitive to scale changes, MAQT offers a Asym-Fusion block. Additionally, we design a new query strategy to optimize the initial selection of queries with Uncertainty-minimal Query Selection. This study combines two self-attention mechanisms in the decoding phase to more correctly understand and record the intricate relationships among keypoints. Based on experimental results on MS COCO using the CrowdPose dataset, MAQT performs better than current contemporary methods.

Version published to 10.21203/rs.3.rs-4648561/v1 on Research Square
Jul 24, 2024

SPARK: Sparse-Perception Action Recognition with Keyframes for Quadruped Robots

This article has 2 authors:
1. Sehun Park
2. Andrew Jaeyong Choi
This article has no evaluationsLatest version Dec 10, 2025
<p style="-qt-block-indent: 0; text-indent: 0px; margin: 0px;">AttnLink: Enhancing Cross-Modal Fusion for Robust Image-to-PointCloud Place Recognition

This article has 2 authors:
1. Ziyu Fang
2. Minghao Ye
This article has no evaluationsLatest version Jan 14, 2026
Lite-FARNet: A Light-weight Feedback Attention Residual Network for Efficient Multi-Class Segmentation in Complex Urban Scenes

This article has 3 authors:
1. Jiaxi Yang
2. Jiaquan Shen
3. Shitong Wang
This article has no evaluationsLatest version Dec 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

SPARK: Sparse-Perception Action Recognition with Keyframes for Quadruped Robots

<p style="-qt-block-indent: 0; text-indent: 0px; margin: 0px;">AttnLink: Enhancing Cross-Modal Fusion for Robust Image-to-PointCloud Place Recognition

Lite-FARNet: A Light-weight Feedback Attention Residual Network for Efficient Multi-Class Segmentation in Complex Urban Scenes