Policy-sensitive feature-driven machine learning for urban air quality prediction and environmental governance applications
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
As one of the major environmental problems facing the world today, air pollution has had a far-reaching negative impact on human health, ecosystem and sustainable economic development. Especially in areas with rapid urbanization and industrialization, the deterioration trend of air quality is becoming more and more serious. Traditional monitoring and evaluation methods have been difficult to meet the needs of accurate prediction and effective management. It is urgent to improve the prediction accuracy and understand the complex relationship between various factors with the help of advanced data analysis technology. In order to overcome the shortcomings of the existing air quality prediction model limited to a single city, this study systematically compared the prediction performance of six machine learning algorithms (multiple linear regression (MLR), decision tree (DT), random forest (RF), gradient lifting tree (GBDT), k-nearest neighbor (k-NN) and naive Bayes (NB)) based on the air quality data of 31 major cities in China from 2018 to 2022. Key findings reveal, the GBDT model achieved optimal cross-regional performance (e.g., MAE = 5.14 in Lanzhou, R 2 = 0.99 in Lhasa); SHAP analysis identified PM 2.5 and PM 10 as core AQI determinants, with heightened contributions in northern winters. Innovatively, we; Constructed three policy-sensitive features (heating_season, heavy_pollution_alert, festival) adhering to interpretability (GB50736-2012), operability (MEE protocols), and generalizability (State Council holidays) principles; Uncovered spatial heterogeneity in policy effects—heating season indicators showed peak SHAP values (0.82 ± 0.11) in northern industrial cities (e.g., Taiyuan), contributing 58.3% (95% CI: 53.7–63.1) to PM₂.₅ interactions; Transformed ML into policy instrumentation: designed a dynamic threshold mechanism (industrial restrictions auto-triggered at SHAP > 0.7), formulated cross-regional eco-compensation schemes (e.g., Beijing compensating Hebei at RMB 860/ton for PM 2.5 reductions), and developed a real-time policy simulator. These advances provide quantitative foundations for precision environmental governance, with future enhancements possible through multi-source data integration.