We comprehensively compare thirteen machine learning models for forecasting urban air pollutants. However, the accuracy of existing prediction models varies as a function of what specific pollutant is predicted, as well as the nature and size of the training set. We examine the performance of thirteen machine learning models using fifteen years of IoT sensor data, including both meteorological and pollutant data representative of a rural industrial urban environment in the heart of the Lombardy region (Italy). While prior studies have applied machine learning models to urban air pollution forecasting [3], [4], [7], few have systematically compared a diverse set of models using a long-term, 15 -year dataset across multiple pollutants and training data scenarios. In this work, we benchmark thirteen models, revealing how pollutant-specific characteristics and training history affect forecasting performance. Ensemble tree-based models, particularly LightGBM, XGBoost, and Random Forest, consistently outperform others, especially for pollutants with strong temporal patterns such as NO2 and NO. Conversely, pollutants like NH3 and CO prove more challenging to predict, due to irregular dynamics and weaker correlation with meteorological features. Our analysis also reveals that increasing the proportion of training data generally enhances model accuracy as expected, though improvements diminish beyond a 70-80% split w.r.t test data.
Comparative Analysis of Machine Learning Models for Forecasting Urban Air Pollutants
Ivanova, Martina;Celani, Alberto;Mottola, Luca
2025-01-01
Abstract
We comprehensively compare thirteen machine learning models for forecasting urban air pollutants. However, the accuracy of existing prediction models varies as a function of what specific pollutant is predicted, as well as the nature and size of the training set. We examine the performance of thirteen machine learning models using fifteen years of IoT sensor data, including both meteorological and pollutant data representative of a rural industrial urban environment in the heart of the Lombardy region (Italy). While prior studies have applied machine learning models to urban air pollution forecasting [3], [4], [7], few have systematically compared a diverse set of models using a long-term, 15 -year dataset across multiple pollutants and training data scenarios. In this work, we benchmark thirteen models, revealing how pollutant-specific characteristics and training history affect forecasting performance. Ensemble tree-based models, particularly LightGBM, XGBoost, and Random Forest, consistently outperform others, especially for pollutants with strong temporal patterns such as NO2 and NO. Conversely, pollutants like NH3 and CO prove more challenging to predict, due to irregular dynamics and weaker correlation with meteorological features. Our analysis also reveals that increasing the proportion of training data generally enhances model accuracy as expected, though improvements diminish beyond a 70-80% split w.r.t test data.| File | Dimensione | Formato | |
|---|---|---|---|
|
Comparative_Analysis_of_Machine_Learning_Models_for_Forecasting_Urban_Air_Pollutants.pdf
Accesso riservato
Dimensione
2.15 MB
Formato
Adobe PDF
|
2.15 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


