Prediction of Air Quality Index for Cook County, Illinois

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Air quality prediction is critical for public health management and environmental policymaking, as poor air quality contributes to respiratory diseases, cardiovascular conditions, and premature mortality. Previous research has demonstrated that machine learning models can effectively forecast air quality indices by capturing complex relationships between meteorological variables and pollutant concentrations, with ensemble methods consistently outperforming traditional linear approaches. This study aims to develop and evaluate predictive models for daily Air Quality Index (AQI) in Cook County, Illinois, to support proactive environmental health interventions. Daily air quality data spanning from January 2015 to October 2025 were obtained from the EPA Air Quality System, encompassing 20 environmental parameters including PM2.5, ozone, nitrogen dioxide, and meteorological conditions. The dataset was enhanced through feature engineering, creating 50+ features including temporal patterns, lag variables, rolling averages, and interaction terms. Eleven machine learning models were trained and evaluated, ranging from traditional regression algorithms to advanced ensemble methods (XGBoost) and deep learning architectures (MLP, LSTM). XGBoost with hyperparameter tuning emerged as the best-performing model, achieving 88.4% variance explanation (R²=0.8842) with a mean absolute error of 7.28 AQI points. Feature importance analysis revealed that ozone, PM2.5, and nitrogen dioxide were the strongest predictors, with temporal lag features significantly improving model accuracy. These findings enable environmental agencies to implement early warning systems for poor air quality days, optimize sensor deployment strategies across Cook County's 155 monitoring sites, and develop targeted interventions during high-risk periods such as summer months when ozone levels peak.

Article activity feed