Render‑Rank‑Refine: Accurate 6D Indoor Localization via Circular Rendering
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate six-degree-of-freedom (6-DoF) camera pose estimation is essential for augmented reality, robotics navigation, and indoor mapping. Existing pipelines often depend on detailed floorplans, strict Manhattan-world priors, and dense structural annotations, which may lead to failures in ambiguous, overlapping-room layouts (ambiguous? not overlapping). We present Render-Rank-Refine, a two-stage framework operating on coarse semantic meshes without requiring textured models or per-scene fine-tuning. First, panoramas rendered from the mesh enable global retrieval of coarse pose hypotheses. Then, perspective views from the top-$k$ candidates are compared to the query via rotation-invariant circular descriptors, which reranks the matches before final translation and rotation refinement. In general, our method reduces the translation and rotation error by an average of 40% and 29%, respectively, compared to the baseline while achieving more than $90\%$ improvement in cases with severe layout ambiguity. It sustains 25–27 queries per second (QPS), which is about 12 times faster than the existing state-of-the-art, without sacrificing accuracy. These results demonstrate robust, near-real-time indoor localization that overcomes structural ambiguities and heavy geometric assumptions.