Advancing HGV Detection with Limited Data: A Semantic Segmentation Framework Using SLiMe
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Heavy Goods Vehicles (HGVs) segmentation is an essential enabler for intelligent transportation systems, vehicle accurate assessment, and traffic law enforcement. However, classical deep learning methods rely on dense pixel-level annotations, which are expensive and time-consuming to collect, particularly in cases involving specific HGV categories and rare traffic scenes. In this paper, we utilise a one-shot segmentation method, named SLiMe (Segment Like Me), which employs cross-attention and self-attention maps of pre-trained Stable Diffusion models to transfer the learned knowledge from a single-annotated reference to unseen target scenes. Compared to type-specific natural language object detectors such as YOLOv11 or generic segmenters such as SAM and CLIPSeg, SLiMe addresses class-specific segmentation with a single annotated example and eliminates any retraining or prompt generation effort.Our chosen approach demonstrates strong segmentation quality on buildings within various urban traffic scenes, outperforming baseline models in terms of mean IoU and pixel accuracy, particularly in challenging lighting conditions or occlusion scenarios. The results are encouraging for deploying SLiMe in transportation scenarios requiring limited annotation and fine-grained semantic understanding with a focus on real-time inference performance, highlighting the relevance for smart cities and autonomous vehicle perception systems.