Enhanced Human Pose Estimation via Self-Distilled and Token- Pruned Transformer

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Human pose estimation (HPE) is a fundamental challenge in computer vision, aiming to detect anatomical keypoints in images. Traditional methods rely on CNN models, but recent advancements in Vision Transformer (ViT) models have shown superior performance. However, ViTs often require substantial computational resources. This paper introduces SPTPose, a method that employs self-distillation and token pruning to reduce computational costs while maintaining high performance. Our SPTPose-B achieves a mAP of 74.8% on the MSCOCO validation set with only 13.2 million parameters and 4.7 GFLOPs. The source code is available at https://github.com/duduxx123/SPTPose.

Article activity feed