RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Modern chemical plants record thousands of sensor tags, yet only a small fraction meaningfully influence yield, quality, or throughput. Identifying those key drivers is often more difficult than building the predictive model itself. In this work, we show that appending one or more Synthetic Noise Features (SNFs), non-informative random variables known a priori, provide a simple reference for judging variable relevance. We show the impact of this model agnostic step across three workflows. In supervised learning, noise features establish an automatic cutoff for the feature importance, guide model regularization and signal when the dataset itself lacks predictive information. In unsupervised learning, they provide an unbiased threshold preventing spurious anomalies and latent dimensions. Finally, we demonstrate the applicability of this approach to small datasets typical of experimental work and Design of Experiments (DoE), including Definitive Screening, Response Surface, and space-filling designs, as well as active learning using Bayesian optimization. By turning nothing but noise into a quantitative benchmark, SNFs offer an immediately deployable safeguard against overfitting and misplaced experimental effort in data-driven chemical engineering.

All you need is noise — from feature selection to explainable industrial AI

Mattia Vallerio;Antonio del Rio Chanona;Francisco J. Navarro-Brull

2026-01-01

Abstract

Modern chemical plants record thousands of sensor tags, yet only a small fraction meaningfully influence yield, quality, or throughput. Identifying those key drivers is often more difficult than building the predictive model itself. In this work, we show that appending one or more Synthetic Noise Features (SNFs), non-informative random variables known a priori, provide a simple reference for judging variable relevance. We show the impact of this model agnostic step across three workflows. In supervised learning, noise features establish an automatic cutoff for the feature importance, guide model regularization and signal when the dataset itself lacks predictive information. In unsupervised learning, they provide an unbiased threshold preventing spurious anomalies and latent dimensions. Finally, we demonstrate the applicability of this approach to small datasets typical of experimental work and Design of Experiments (DoE), including Definitive Screening, Response Surface, and space-filling designs, as well as active learning using Bayesian optimization. By turning nothing but noise into a quantitative benchmark, SNFs offer an immediately deployable safeguard against overfitting and misplaced experimental effort in data-driven chemical engineering.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2026
			
	Titolo della rivista
	
				DIGITAL CHEMICAL ENGINEERING
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S2772508126000037-main.pdf accesso aperto : Publisher’s version Dimensione 6.41 MB Formato Adobe PDF Visualizza/Apri	6.41 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1312147

Citazioni

ND

0

0

social impact