Large-Scale Airborne LiDAR Point Cloud BuildingExtraction Based on Improved Voxelized DeepLearning Network
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
To address the critical challenges of semantic ambiguity, uneven density distribution, and inadequate adaptability to complexstructures in large-scale urban LiDAR point cloud building extraction, this paper proposes a novel approach integratinggeometric topology perception with cross-dimensional attention mechanisms. Based on the Sparse Voxel ConvolutionalNeural Network (SPVCNN) framework, we innovatively design the following key technologies: First, we propose an enhancedLasermix++ multi-scale hybrid augmentation algorithm. It employs cross-scene point cloud block replacement with probabilitydriven sampling, coupled with ground normal-constrained rotation matrices and nonuniform scaling strategies. Secondly, thecollaborative mechanism of Geometric Self-Attention (GSA) and Cross-Space Residual Attention (CSRA) are first embedded inthe SPVCNN dual-branch framework. The topological preservation coding of building geometric features is realized by dynamicvoxel granularity adjustment and GSA module. Finally, we introduce a Boundary Enhancement Module (BEM) to effectivelyresolve separation challenges in highly overlapping structures and mitigate boundary ambiguity issues. The experiment uses177 square kilometers of airborne LiDAR data in Washington, D.C., United States. The results show that: Compared to thebaseline SPVCNN (Acc = 0.8212, IoU = 0.866), the proposed GSA-CSRA framework achieves significant improvements,with accuracy increasing to 0.9416 (+12.04%) and IoU to 0.9656 (+9.96%), substantially outperforming attention variantssuch as Squeeze-and-Excitation (SE) and Convolutional Block Attention Module (CBAM). Furthermore, the proposed methodachieves a remarkable accuracy improvement exceeding 50% compared to mainstream point cloud networks, as evidenced byits superior performance against Cylinder3D (Acc = 0.4189) and MinkResNet (Acc = 0.5328). This significant advancementclearly demonstrates the breakthrough advantages of combining geometric perception with adaptive attention mechanisms forbuilding extraction from point clouds.