Advancing Urban Roof Segmentation: Transformative Deep Learning Models from CNNs to Transformers for Scalable and Accurate Urban Imaging Solutions A case study in Ben Guerir City, Morocco

Hachem Saadaoui
Saad Farah
Hatim Lechgar
Abdellatif Ghennioui
Hassan Rhinane

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Urban roof segmentation plays a pivotal role in applications such as urban planning, infrastructure management, and renewable energy deployment. This study explores the evolution of deep learning techniques from traditional Convolutional Neural Networks (CNNs) to cutting-edge Transformer-based models in the context of roof segmentation from satellite imagery. We highlight the limitations of conventional methods when applied to urban environments, including resolution constraints and the complexity of roof structures. To address these challenges, we evaluate two advanced deep learning models: Mask R-CNN and MaskFormer, which have shown significant promise in accurately segmenting roofs, even in dense urban settings with diverse roof geometries. These models, especially the one based on transformers, offer improved segmentation accuracy by capturing both global and local image features, enhancing their performance in tasks where fine detail and contextual awareness are critical. A case study on Ben Guerir City in Morocco, an urban area experiencing rapid development, serves as the foundation for testing these models. Using high-resolution satellite imagery, the segmentation results offer a deeper understanding of the accuracy and effectiveness of these models, particularly in optimizing urban planning and renewable energy assessments. Quantitative metrics such as Intersection over Union (IoU), precision, recall, and F1-score are used to benchmark model performance. Mask R-CNN achieved a mean IoU of 74.6%, precision of 81.3%, recall of 78.9%, and F1-score of 80.1%. MaskFormer outperformed Mask R-CNN, reaching a mean IoU of 79.8%, precision of 85.6%, recall of 82.7%, and F1-score of 84.1%, highlighting the transformative potential of transformer-based architectures for scalable and precise urban imaging. The study also outlines future work in 3D modelling and height estimation, positioning these advancements as critical tools for sustainable urban development.

Version published to 10.20944/preprints202507.1048.v1
Jul 15, 2025

A Hybrid RNN-CNN Approach with TPI for High-Precision DEM Reconstruction

This article has 6 authors:
1. Ruizhe Cao
2. Chunjing Yao
3. Hongchao Ma
4. Bin Guo
5. Jie Wang
6. Junhao Xu
This article has no evaluationsLatest version Jun 17, 2025
An Improved HRNetV2-Based Algorithm for Semantic Segmentation of Corroded Regions in Urban Drainage Pipes

This article has 1 author:
1. Gao Liang
This article has no evaluationsLatest version Jun 26, 2025
Deep-Learning Integration of CNN–Transformer and U-Net for Bi-Temporal SAR Flash-Flood Detection

This article has 3 authors:
1. Abbas Mohammed Noori
2. Abdul Razzak T. Ziboon
3. Amjed N. AL-Hameedawi
This article has no evaluationsLatest version Jul 10, 2025

Listed in

Abstract

Article activity feed

Related articles

A Hybrid RNN-CNN Approach with TPI for High-Precision DEM Reconstruction

An Improved HRNetV2-Based Algorithm for Semantic Segmentation of Corroded Regions in Urban Drainage Pipes

Deep-Learning Integration of CNN–Transformer and U-Net for Bi-Temporal SAR Flash-Flood Detection