RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

We comprehensively compare thirteen machine learning models for forecasting urban air pollutants. However, the accuracy of existing prediction models varies as a function of what specific pollutant is predicted, as well as the nature and size of the training set. We examine the performance of thirteen machine learning models using fifteen years of IoT sensor data, including both meteorological and pollutant data representative of a rural industrial urban environment in the heart of the Lombardy region (Italy). While prior studies have applied machine learning models to urban air pollution forecasting [3], [4], [7], few have systematically compared a diverse set of models using a long-term, 15 -year dataset across multiple pollutants and training data scenarios. In this work, we benchmark thirteen models, revealing how pollutant-specific characteristics and training history affect forecasting performance. Ensemble tree-based models, particularly LightGBM, XGBoost, and Random Forest, consistently outperform others, especially for pollutants with strong temporal patterns such as NO2 and NO. Conversely, pollutants like NH3 and CO prove more challenging to predict, due to irregular dynamics and weaker correlation with meteorological features. Our analysis also reveals that increasing the proportion of training data generally enhances model accuracy as expected, though improvements diminish beyond a 70-80% split w.r.t test data.

Comparative Analysis of Machine Learning Models for Forecasting Urban Air Pollutants

Ivanova, Martina;Celani, Alberto;Mottola, Luca

2025-01-01

Abstract

We comprehensively compare thirteen machine learning models for forecasting urban air pollutants. However, the accuracy of existing prediction models varies as a function of what specific pollutant is predicted, as well as the nature and size of the training set. We examine the performance of thirteen machine learning models using fifteen years of IoT sensor data, including both meteorological and pollutant data representative of a rural industrial urban environment in the heart of the Lombardy region (Italy). While prior studies have applied machine learning models to urban air pollution forecasting [3], [4], [7], few have systematically compared a diverse set of models using a long-term, 15 -year dataset across multiple pollutants and training data scenarios. In this work, we benchmark thirteen models, revealing how pollutant-specific characteristics and training history affect forecasting performance. Ensemble tree-based models, particularly LightGBM, XGBoost, and Random Forest, consistently outperform others, especially for pollutants with strong temporal patterns such as NO2 and NO. Conversely, pollutants like NH3 and CO prove more challenging to predict, due to irregular dynamics and weaker correlation with meteorological features. Our analysis also reveals that increasing the proportion of training data generally enhances model accuracy as expected, though improvements diminish beyond a 70-80% split w.r.t test data.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del libro
	
				Proceedings - 2025 21st International Conference on Distributed Computing in Smart Systems and the Internet of Things, DCOSS-IoT 2025
			
	ISBN (International Standard Book Number)
	
				979-8-3315-4372-3
			
	Parole chiave
	
				Air quality forecasting
data-driven modeling
ensemble models
environmental monitoring
machine learning
pollutant prediction
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Comparative_Analysis_of_Machine_Learning_Models_for_Forecasting_Urban_Air_Pollutants.pdf Accesso riservato Dimensione 2.15 MB Formato Adobe PDF Visualizza/Apri	2.15 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1298433

Citazioni

ND

1

ND

ND

social impact