SimpleCID: Skeleton-Guided Lightweight Heatmap Refinement for Robust Multi-Person Pose Estimation in Crowded Scenes

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multi-person pose estimation in crowded scenes remains difficult because heavy overlap and occlusion frequently break independently predicted keypoint heatmaps. Contextual Instance Decoupling (CID) improves crowded-scene estimation by generating instance-aware feature maps, yet its final joint heatmaps are still produced channel by channel without explicit structural coupling. This paper presents SimpleCID, a lightweight refinement built on top of CID. After the Global Feature Decoupling stage, we model the keypoints of each person as nodes in a human-body graph and propagate responses through a fixed normalized adjacency matrix. The refined response is fused with the original heatmap by a residual connection with a small coefficient, allowing adjacent joints to provide structural support while preserving the baseline prediction. The module introduces no additional trainable parameters, keeps the original training pipeline unchanged, and adds only lightweight matrix multiplication along the joint dimension. On crowded-scene benchmarks, SimpleCID consistently improves the baseline: it raises AP by 1.2 points on CrowdPose and improves OCHuman AP from 41.4 to 43.3. Qualitative comparisons further show more complete limb recovery and fewer anatomically inconsistent predictions under severe occlusion. These results demonstrate that explicit yet simple skeleton reasoning is an effective complement to contextual instance decoupling.

Article activity feed