Machine Learning for Urban Air Quality Prediction Using Google AlphaEarth Foundations Satellite Embeddings: A Case Study of Quito, Ecuador
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Air quality monitoring in many cities of the Global South is constrained by sparse ground-based networks and persistent cloud cover, limiting the ability to detect fine-scale spatial patterns and temporal trends. This study evaluates the potential of annual multi-sensor satellite embeddings from the AlphaEarth Foundations model in Google Earth Engine to predict and map major air pollutants in Quito, Ecuador, between 2017 and 2024. The 64-dimensional embeddings integrate Sentinel-1 radar, Sentinel-2 optical imagery, Landsat surface reflectance, ERA5-Land climate variables, GRACE terrestrial water storage, and GEDI canopy structure into a compact representation of surface and climatic conditions. Annual median concentrations of NO₂, SO₂, PM₂.₅, CO, and O₃ from the Red Metropolitana de Monitoreo Atmosférico de Quito (REEMAQ) were paired with collocated embeddings and modeled using five machine learning algorithms. Support Vector Regression achieved the highest accuracy for NO₂ and SO₂ (R² = 0.71 for both), capturing fine-scale spatial patterns and multi-year changes, including COVID-19 lockdown-related reductions. PM₂.₅ and CO were predicted with moderate accuracy, while O₃ remained challenging due to its short-term photochemical and meteorological drivers and the mismatch with annual aggregation. SHAP analysis revealed that a small subset of embedding bands dominated predictions for NO₂ and SO₂. The approach provides a scalable and transferable framework for high-resolution urban air quality mapping in data-scarce environments, supporting long-term monitoring, hotspot detection, and evidence-based policy interventions.