Statistical and Machine Learning Analysis of PM2.5 Concentrations and Meteorological Influences

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Fine particulate matter (PM2.5) poses a significant public health risk in densely populated urban areas like Dhaka, Bangladesh. This study presents a comprehensive analysis of PM2.5 concentrations and their relationship with meteorological variables from 2019 to 2024. We employed a robust methodological framework, beginning with advanced data imputation using a Kalman filter to handle missing values while preserving temporal structure [1]. A suite of statistical and machine learning models—including Gradient Boosting, Elastic Net regression, Generalized Linear Models ( GLMs ), and an Autoregressive (AR) model—were developed to predict PM2.5 levels and identify key drivers. Our results indicate that while meteorological variables like rainfall and wind speed have statistically significant cleansing effects, they are insufficient for accurate daily PM2.5 prediction when used in isolation, as demonstrated by the low explanatory power (R² ≈ 0) of the machine learning models [2]. This underscores the complexity of air pollution in Dhaka, suggesting a stronger influence from non-meteorological factors such as transboundary pollution and anthropogenic activities. In contrast, the AR(15) model effectively captured the strong temporal persistence of PM2.5. The selection of this model was validated through Autocorrelation ( ACF) & Partial Autocorrelation ( PACF ) analysis, which revealed strong temporal persistence & informed the optimal lag structure. The study successfully translates these findings into a health risk assessment using WHO Air Quality Index( AQI ) categories, clearly identifying winter as the most polluted season and noting a general improving trend in air quality from 2019 to 2024 [3], [4]. This work highlights the limitations of meteorological-based daily forecasting and emphasizes the need for models that integrate a broader range of predictors to effectively inform public health policy.

Article activity feed