Interpretable Multi-Horizon Machine Learning Framework for PM₂.₅ Forecasting in Tashkent: Toward Early-Warning Air Quality Management
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Fine particulate matter (PM₂.₅) poses a major environmental and public health risk in Central Asia, yet predictive air-quality modeling remains limited due to fragmented monitoring networks and data scarcity. This study presents an interpretable multi-horizon machine-learning framework for PM₂.₅ forecasting in Tashkent, Uzbekistan, representing the first such analysis for the country. Six models, linear regression, ridge regression, LASSO, random forest, XGBoost, and long short-term memory (LSTM), were developed and evaluated under realistic data-limited conditions using hourly air-quality and meteorological observations. Forecasts were generated for three operational horizons (1 h, 24 h, and 168 h). Results show that short-term PM₂.₅ persistence dominates predictive skill, with XGBoost achieving the highest accuracy and stability, outperforming LSTM under fragmented datasets. Feature-selection and SHAP analyses provide transparent insight into dominant pollution drivers, enhancing policy relevance. Spatial aggregation across monitoring stations improves robustness for city-scale early-warning applications, albeit with reduced peak sensitivity. The proposed framework offers a data-efficient and interpretable pathway for operational air-quality management in emerging monitoring contexts.