Bridging Sparse Air-Quality Monitoring: Machine-Learning Sharpens Daily PM2.5 in 12 Cities Across Two Regions
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Airborne particles smaller than 2.5 µm (PM 2.5 ) pose significant risks to human health and influence the climate. However, the accuracy of ground- and satellite-based estimates varies widely across regions. To evaluate the potential of machine learning (ML) to improve ground PM 2.5 concentration measurements, we analyzed ground-level PM 2.5 concentrations in 12 cities across the Greater Middle East (GME) and Canada. We deployed three ML models to enhance daily PM 2.5 estimations, using nearly a decade of combined ground-based observations and MODIS-MAIAC aerosol optical depth (AOD), along with a uniform predictor set comprising AOD and meteorological variables. To ensure comparability, each city was anchored to a single regulatory monitor in both Canada and the GME. Using ten-fold cross-validation, ML improved the AOD–PM 2.5 correlation from approximately 0.15 to a mean R of 0.59 in Canada and ~0.48 in the GME. A pooled regional model integrating all GME observations achieved high out-of-sample agreement (r ≈ 0.90), compared to an AOD-only fit (r ≈ 0.11). SHAP diagnostics revealed that PM 2.5 history (lags and rolling means), AOD, its interaction with physical processes (e.g., temperature and pressure), and boundary-layer height were the dominant drivers in the GME, with more stable influences observed in Canada. PM 2.5 levels in Canada rarely exceeded the WHO guideline, whereas exceedances were frequent across all GME cities. These findings demonstrate that ML, particularly when incorporating temporal context and regional pooling, can significantly enhance PM 2.5 inference in data-scarce environments. Nonetheless, we emphasize the ongoing need for denser ground monitoring to support high-resolution mapping.