From Empirical Curves to AI-Derived Rainfall Thresholds for Landslide Initiation in Peninsular Malaysia
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Rainfall-induced landslides are a persistent hazard in Malaysia, yet existing rainfall thresholds remain largely based on empirical methods and often lack regional adaptability. This study employs machine learning (ML) based rainfall thresholds for landslide initiation in Peninsular Malaysia. A dataset of rainfall events from 70 rainfall stations across peninsular Malaysia linked with documented 79 landslides was analysed, along with key predictors such as event cumulative rainfall (ECR), maximum and mean intensity, duration, and antecedent rainfall windows (3–20 days). Two state-of-the-art gradient boosting algorithms, CatBoost and XGBoost, were trained to classify rainfall events as landslide- or non-landslide-triggering. Performance of models was evaluated using a confusion matrix, precision, Accuracy, recall, F1-score, and ROC-AUC. Moreover, SHAP explainability analysis was applied to assess the relative importance of rainfall metrics in threshold exceedance. CatBoost shows a superior practical reliability, with a higher accuracy of 0.83 and recall of 0.67 as compared to XGBoost, which showed a higher ROC–AUC of 0.876 but substantially lower recall of 0.33. These findings demonstrate that ML derived rainfall thresholds for peninsular Malaysia offer a more flexible and reliable basis for early warning systems, supporting landslide risk management in Malaysia.