SuperSegmentation: KeyPoint Detection and Description with Semantic Labeling for VSLAM

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We propose SuperSegmentation, a unified, fully-convolutional architecture for semantic keypoint correspondence in dynamic urban scenes. The model extends SuperPoint’s self-supervised interest point detector–descriptor backbone with a DeepLab-style Atrous Spatial Pyramid Pooling head for semantic segmentation and a lightweight sub-pixel regression branch. Using Cityscapes camera intrinsics and extrinsics to construct geometry-aware homographies, SuperSegmentation jointly predicts keypoints, descriptors, semantic labels(e.g., static vs. dynamic classes), and sub-pixel offsets from a shared encoder. Our experiments are conducted on Cityscapes, where a backbone pretrained on MS-COCO with strong random homographies over approximately planar images is fine-tuned with deliberately attenuated synthetic warps, as we found that reusing the aggressive COCO-style homographies on Cityscapes produced unrealistically large distortions. Within this controlled setting, we observe that adding semantic masking and sub-pixel refinement consistently improves stability on static structures and suppresses keypoints on dynamic or ambiguous regions.

Article activity feed