Explainable Machine Learning for Climate Change Hotspot Identification: Spatial Generalization Testing in South Asian Region
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Machine learning (ML) continues to be utilized in climate-change research, and much of the analysis overestimates model performance, as it does not include spatial dependence. This methodological inadequacy creates the misleading impression that models generalize well to new locations, when in fact they often fail outside the training domain. We evaluate eight ML predictors of temperature anomalies in 22 districts of Sindh, Pakistan on 44 years of observations. We performed spatial generalization on a Leave-One-District-Out Cross-Validation (LODO-CV) and tested this generalization to completely unfamiliar locations. Gradient Boosting was the most successful algorithm with ($R^2 = 0.914\pm 0.098$) when predicting the temperature anomaly in areas that were not included in the training, which indicates a strong transferability to the wide range of climatic areas across the region. SHAP feature attribution showed that climate variables (37.6\%), temporal trends (32.0\%), and anthropogenic proxies (23.7\%), are the most important predictors, although it is also important to note the caveat that the importance of proxies is only indicative of correlation, not causation, and must be carefully considered when applying to policy matters. Part of dependence analysis estimated a negative dependence of vegetation-temperature of $-0.15^{\circ}$C per 0.1 NDVI of vegetation increase indicating that vegetation preservation and restoration measures may provide cooling advantages. By using a dual-index model, which integrates the frequency of extreme events with average climate changes, we were able to pinpoint seven hotspots of climate change, concentrated in Karachi and Hyderabad urban areas, which are exposed to compound risk of urbanization, coastal exposure, and rising temperature extremes. The results indicate the urgent need of spatially explicit validation procedures when using climate ML and offer practical suggestions to specific adaptation planning to the most climate-prone districts in Pakistan.