Explainable Ensemble-Based Unsupervised Anomaly Detection in Urban Air Pollution Time Series
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Urban air pollution poses complex challenges due to its multivariate, non-linear dynamics and significant implications for public health and environmental sustainability. This study presents a comprehensive, explainable, and unsupervised anomaly detection framework for identifying atypical pollution events in multivariate time series from urban monitoring networks. We analyse long-term records of five key atmospheric pollutants (PM₂.₅, PM₁₀, NO, NO₂, NOₓ) from six air quality stations across Valencia, Spain, using an ensemble of twelve state-of-the-art unsupervised algorithms. The ensemble anomaly scores are interpreted through dimensionality reduction (UMAP), clustering, and SHAP-based feature attribution using a surrogate XGBoost model. Our results reveal distinct classes of pollution anomalies driven by traffic, meteorological conditions, biomass burning, and extreme weather events. The model successfully captures both abrupt pollution episodes and more subtle multivariate deviations, demonstrating high internal coherence and robustness across algorithmic families. By integrating explainable machine learning with ensemble detection strategies, this framework enhances the reliability of air quality data, supports environmental diagnostics, and enables context-aware responses to pollution episodes. The approach is scalable, transferable, and well-suited for improving anomaly detection and interpretation in environmental monitoring systems.