Explainable Transfer Learning of Cross-Dataset in Visual Scene Segmentation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Semantic segmentation plays a pivotal role in autonomous driving by providing pixel-level scene understanding that is essential for reliable perception and decision-making. transformer-based architectures such as SegFormer have demonstrated state-of-the-art performance on large-scale benchmarks; however, their scalability and generalization to smaller or geographically diverse datasets remain underexplored. In this work, we investigate the scalability and transferability of SegFormer variants (B3, B4, B5) using CamVid as a base dataset, followed by cross-dataset transfer learning to KITTI and IDD. Beyond accuracy, we incorporate explainable AI techniques to assess model interpretability, employing confidence-based heatmaps to reveal class-level reliability and highlight regions of uncertainty in predictions. Our findings show that SegFormer-B5 achieves the highest performance on CamVid (82.4% mIoU), while transfer learning from CamVid improves mIoU on KITTI by 2.57% and enhances class-specific predictions in IDD by more than 70%. These results demonstrate not only the robustness of SegFormer in diverse driving scenarios but also the added value of explainability in interpreting model decisions, identifying dataset-specific challenges, and supporting safer deployment in real-world segmentation systems.