Suitability of Machine Learning Models and their Performance for PM 2.5 Estimation using high-resolution satellite-driven datasets over Northwest India
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The estimation of PM 2.5 levels using high-resolution satellite-driven datasets and machine learning algorithms represented a potential advancement in air quality monitoring over Northwest India (NW). The traditional ground-based PM 2.5 measurements, while accurate, suffer from limited spatial coverage, prompting the need for satellite-based retrieval methods. The machine learning (ML) algorithms convert high-resolution satellite-derived Aerosol Optical Depth (AOD) into PM 2.5 , and enhance the accuracy of this conversion. Therefore, this study presented 1km resolution of satellite-driven PM 2.5 estimation framework using Multi-Angle Implementation of Atmospheric Correction (MAIAC) AOD and meteorology through ML algorithms over under-covered NW India. This study used XGBoost, random forest (RF), support vector machine (SVM), and AdaBoost ML models to integrating the MAIAC AOD with meteorological variables. The datasets have been pre-processed and optimized for better accuracy from 2022 to 2023 align with ground observations. RF and XGBoost (R² = 0.91 and 0.91, RMSE = 29.34 µg/m³ and 32.19 µg/m³, Bias = 0.30 µg/m³ and 0.48 µg/m³, respectively) outperform AdaBoost and SVM over northwest India. The estimated PM 2.5 values exceed National Ambient Air Quality Standards (NAAQS), with mean 24-hour and annual average concentrations of 74.05 µg/m³ and 70.53 µg/m³, underlining severe air pollution in the region. By leveraging high-resolution satellite data and advanced ML techniques, this study offers a novel and scalable solution for PM 2.5 estimation in data-scarce regions. These fusing approaches provided actionable insights for air quality monitoring and policymaking, enhanced the ability to capture the complexity of PM 2.5 variability, and facilitated predictive models that contribute to efficient air quality management.