Comparison of Selected Ensemble Supervised Learning Algorithms Used for Meteorological Normalisation of Particulate Matter (PM10)
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Air pollution, particularly PM10 particulate matter, poses significant health risks related to respiratory and cardiovascular diseases as well as cancer. Accurate identification of PM10 reduction factors is therefore essential for developing effective sustainable development strategies. According to the current state of knowledge, machine learning methods are most frequently employed for this purpose due to their superior performance compared to classical statistical approaches. This study evaluated the performance of three machine learning algorithms—Decision Tree (CART), Random Forest, and Cubist Rule—in predicting PM10 concentrations and estimating long-term trends following meteorological normalisation. The research focused on Tarnów, Poland (2010–2022), with comprehensive consideration of meteorological variability. The results demonstrated superior accuracy for the Random Forest and Cubist models (R2 ~0.88–0.89, RMSE ~14 μg/m3) compared to CART (RMSE 19.96 μg/m3). Air temperature and boundary layer height emerged as the most significant predictive variables across all algorithms. The Cubist algorithm proved particularly effective in detecting the impact of policy interventions, making it valuable for air quality trend analysis. While the study confirmed a statistically significant annual decrease in PM10 concentrations (0.83–1.03 μg/m3), pollution levels still exceeded both the updated EU air quality standards from 2024 (Directive (EU) 2024/2881), which will come into force in 2030, and the more stringent WHO guidelines from 2021.