Comparative Analysis of Machine Learning Models for Multi-Horizon PM2.5 Forecasting

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate forecasting of particulate matter (PM2.5) concentrations is critical for public health management and environmental policy-making. This study presents a comprehensive comparison of six machine learning models—Linear Regression, Support Vector Regression (SVR), Random Forest (RF), Gradient Boosting Decision Trees (GBDT), Multi-Layer Perceptron (MLP), and Long Short-Term Memory (LSTM)—for multi-horizon PM2.5 prediction. Using hourly air quality data from 11 cities in Zhejiang Province, China (January-February 2024), we evaluate model performance across three forecast horizons: 1-hour, 6-hour, and 24-hour ahead predictions. Our results demonstrate that model performance varies significantly with forecast horizon. For short-term (1-hour) predictions, Linear Regression achieves the best performance (RMSE=10.682, R²=0.901), suggesting near-linear temporal dynamics. For longer horizons (24-hour), ensemble tree-based models outperform others, with GBDT achieving RMSE=24.264 and R²=0.467. Surprisingly, deep learning approaches (LSTM) underperform traditional machine learning methods, particularly for long-term forecasting. Feature importance analysis reveals that the most recent PM2.5 value (lag-1) accounts for 47.8% of predictive power, while Air Quality Index contributes 42.3%, highlighting the dominance of temporal autocorrelation in PM2.5 dynamics.

Article activity feed