Machine Learning–Based Prediction of Particulate Matter and Gaseous Pollutants in Mega Cities

Hümeyra Bolakar Tosun

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Air pollution remains a major public health concern in large metropolitan areas, where complex interactions between particulate matter, gaseous pollutants, and meteorological conditions shape short- and medium-term pollution dynamics. Traditional statistical approaches often struggle to capture these nonlinear and time-dependent relationships, prompting increased use of machine learning (ML) techniques for air quality prediction. In Türkiye, comprehensive ML-based forecasting studies focusing on major metropolitan areas remain limited. Objective: This study aims to predict PM₂.₅ concentrations in Istanbul and Ankara using machine learning models and to examine the relative contribution of pollutant history and meteorological variables to short-term PM₂.₅ dynamics through interpretable modeling approaches. Methods: Daily air quality data (PM₂.₅, PM₁₀, NO₂, SO₂, O₃, CO) were obtained from the National Air Quality Monitoring Network of the Ministry of Environment, Urbanization, and Climate Change of Türkiye. Meteorological variables were sourced from official meteorological stations. Feature engineering incorporated lagged pollutant values, moving averages, seasonal indicators, and meteorological parameters. Random Forest was selected as the primary modeling approach and evaluated using time-series cross-validation. Model interpretability was assessed through feature importance metrics and SHAP analyses. Multicollinearity diagnostics were conducted using variance inflation factors (VIF). Results: The Random Forest model demonstrated stable predictive performance across time-series cross-validation folds, yielding an average RMSE of 5.70 µg/m³, with fold-specific RMSE values ranging from 3.71 to 7.84 µg/m³. Feature importance analysis revealed that lagged PM₁₀ concentrations (PM₁₀ lag 1) dominated the model, accounting for approximately 82.6% of the total explanatory contribution, indicating a strong short-term autoregressive structure. Meteorological variables such as wind speed and dew point exhibited smaller but consistent contributions (each <5%). SHAP-based interpretability analyses further confirmed the nonlinear influence of both pollutant persistence and meteorological conditions on PM₂.₅ predictions. Conclusions: Machine learning models effectively capture nonlinear and time-dependent patterns in PM₂.₅ concentrations in large metropolitan areas. The findings highlight the dominant role of short-term pollutant persistence, complemented by meteorological influences. Interpretable ML approaches provide actionable insights for air quality management, supporting resource planning and early intervention strategies. The study contributes to the growing body of evidence supporting ML-based air pollution forecasting while emphasizing the importance of interpretability for policy-relevant applications.

Version published to 10.21203/rs.3.rs-8844803/v1 on Research Square
Feb 20, 2026

Assessment and Prediction of Air Pollution Trends in Kuwait Using Machine Learning: An Analysis of PM10 , CO, and SO2 and Their Environmental Health Implications

This article has 4 authors:
1. Raslan Alenezi
2. Naeema Al-Darmaki
3. Fahed Javed
4. Sami Walid Azzam
This article has no evaluationsLatest version Mar 9, 2026
Bridging Sparse Air-Quality Monitoring: Machine-Learning Sharpens Daily PM2.5 in 12 Cities Across Two Regions

This article has 2 authors:
1. Negin Rezaei Nokandeh
2. Parisa A. Ariya
This article has no evaluationsLatest version Feb 23, 2026
Comparative Analysis of Machine Learning Models for Multi-Horizon PM2.5 Forecasting

This article has 1 author:
1. Shengqi Shao
This article has no evaluationsLatest version Jan 29, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Assessment and Prediction of Air Pollution Trends in Kuwait Using Machine Learning: An Analysis of PM10 , CO, and SO2 and Their Environmental Health Implications

Bridging Sparse Air-Quality Monitoring: Machine-Learning Sharpens Daily PM2.5 in 12 Cities Across Two Regions

Comparative Analysis of Machine Learning Models for Multi-Horizon PM2.5 Forecasting