Policy-sensitive feature-driven machine learning for urban air quality prediction and environmental governance applications

Xiaofeng Zhu
Jianwei Gu
Qian Zhang
Yan Cao

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

As one of the major environmental problems facing the world today, air pollution has had a far-reaching negative impact on human health, ecosystem and sustainable economic development. Especially in areas with rapid urbanization and industrialization, the deterioration trend of air quality is becoming more and more serious. Traditional monitoring and evaluation methods have been difficult to meet the needs of accurate prediction and effective management. It is urgent to improve the prediction accuracy and understand the complex relationship between various factors with the help of advanced data analysis technology. In order to overcome the shortcomings of the existing air quality prediction model limited to a single city, this study systematically compared the prediction performance of six machine learning algorithms (multiple linear regression (MLR), decision tree (DT), random forest (RF), gradient lifting tree (GBDT), k-nearest neighbor (k-NN) and naive Bayes (NB)) based on the air quality data of 31 major cities in China from 2018 to 2022. Key findings reveal, the GBDT model achieved optimal cross-regional performance (e.g., MAE = 5.14 in Lanzhou, R ² = 0.99 in Lhasa); SHAP analysis identified PM _2.5 and PM ₁₀ as core AQI determinants, with heightened contributions in northern winters. Innovatively, we; Constructed three policy-sensitive features (heating_season, heavy_pollution_alert, festival) adhering to interpretability (GB50736-2012), operability (MEE protocols), and generalizability (State Council holidays) principles; Uncovered spatial heterogeneity in policy effects—heating season indicators showed peak SHAP values (0.82 ± 0.11) in northern industrial cities (e.g., Taiyuan), contributing 58.3% (95% CI: 53.7–63.1) to PM₂.₅ interactions; Transformed ML into policy instrumentation: designed a dynamic threshold mechanism (industrial restrictions auto-triggered at SHAP > 0.7), formulated cross-regional eco-compensation schemes (e.g., Beijing compensating Hebei at RMB 860/ton for PM _2.5 reductions), and developed a real-time policy simulator. These advances provide quantitative foundations for precision environmental governance, with future enhancements possible through multi-source data integration.

Version published to 10.21203/rs.3.rs-7437937/v1 on Research Square
Sep 4, 2025

Designing an End-to-End Urban Air Pollution Forecasting Framework: A Data-Driven Pipeline Approach

This article has 1 author:
1. Musa Milli
This article has no evaluationsLatest version Sep 25, 2025
Machine Learning Approaches for Predicting Air Pollution Levels: A Transparent, Time-Aware Pipeline for Daily AQI in Indian Cities

This article has 2 authors:
1. Philipp Goetzinger
2. Sebastian Noy
This article has no evaluationsLatest version Oct 13, 2025
Flood Prediction with Artificial Intelligence An Exploratory Data Analysis Approach

This article has 7 authors:
1. Arya Vithal Mane
2. Rashmi Ravindra Halkarni
3. Pallavi Mahesh Bhat
4. Amarnath Mahesh Kakatikar
5. Rajkumar Raikar
6. Rajashri Khanai
7. Salma Shamashoddin Shahapur
This article has no evaluationsLatest version Sep 17, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Designing an End-to-End Urban Air Pollution Forecasting Framework: A Data-Driven Pipeline Approach

Machine Learning Approaches for Predicting Air Pollution Levels: A Transparent, Time-Aware Pipeline for Daily AQI in Indian Cities

Flood Prediction with Artificial Intelligence An Exploratory Data Analysis Approach