Explainable Transfer Learning of Cross-Dataset in Visual Scene Segmentation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Semantic segmentation plays a pivotal role in autonomous driving by providing pixel-level scene understanding that is essential for reliable perception and decision-making. transformer-based architectures such as SegFormer have demonstrated state-of-the-art performance on large-scale benchmarks; however, their scalability and generalization to smaller or geographically diverse datasets remain underexplored. In this work, we investigate the scalability and transferability of SegFormer variants (B3, B4, B5) using CamVid as a base dataset, followed by cross-dataset transfer learning to KITTI and IDD. Beyond accuracy, we incorporate explainable AI techniques to assess model interpretability, employing confidence-based heatmaps to reveal class-level reliability and highlight regions of uncertainty in predictions. Our findings show that SegFormer-B5 achieves the highest performance on CamVid (82.4% mIoU), while transfer learning from CamVid improves mIoU on KITTI by 2.57% and enhances class-specific predictions in IDD by more than 70%. These results demonstrate not only the robustness of SegFormer in diverse driving scenarios but also the added value of explainability in interpreting model decisions, identifying dataset-specific challenges, and supporting safer deployment in real-world segmentation systems.

Article activity feed