MAQT:Multi-scale Attention and Query-Optimized Transformer for End-to-End Pose Estimation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Researchers are rapidly turning their focus to human pose estimation as a crucial area of computer vision. In light of the shortcomings of existing Transformer-based pose estimate methods when handling localized features, this work presents MAQT, an enhanced end-to-end method aimed at precise multi-human body pose estimation.To improve the localization of keypoints that are sensitive to scale changes, MAQT offers a Asym-Fusion block. Additionally, we design a new query strategy to optimize the initial selection of queries with Uncertainty-minimal Query Selection. This study combines two self-attention mechanisms in the decoding phase to more correctly understand and record the intricate relationships among keypoints. Based on experimental results on MS COCO using the CrowdPose dataset, MAQT performs better than current contemporary methods.

Article activity feed