Comparative Performance of Machine Learning Models for Landslide Susceptibility Assessment: Impact of Sampling Strategies in Highway Buffer Zone

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Landslide susceptibility assessment is critical for hazard mitigation and land-use planning. This study evaluates the impact of two different non-landslide sampling methods—random sampling and sampling constrained by the Global Landslide Hazard Map (GLHM)—on the performance of various machine learning and deep learning models, including Naïve Bayes (NB), Support Vector Machine (SVM), SVM-Random Forest hybrid (SVM-RF), and XGBoost. The study area is a 2 km buffer zone along the Duku Highway in Xinjiang, China, with 101 landslide and 101 non-landslide points extracted by aforementioned sampling methods. Models were tested using ROC curves and non-parametric significance tests based on 20 repetitions of 5-fold spatial cross-validation data. GLHM sampling consistently improved AUROC and accuracy across all models (e.g., AUROC gains: NB +8.44, SVM +7.11, SVM-RF +3.45, XGBoost +3.04; accuracy gains: NB +11.30%, SVM +8.33%, SVM-RF +7.40%, XGBoost +8.31%). XGBoost delivered the best performance under both sampling strategies, reaching 94.61% AUROC and 84.30% accuracy with GLHM sampling. SHAP analysis showed that GLHM sampling stabilized feature importance rankings, highlighting STI, TWI and NDVI are the main controlling factors for landslides in the study area. These results highlight the importance of hazard-informed sampling to enhance landslide susceptibility modeling accuracy and interpretability.

Article activity feed