RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Machine learning algorithms are designed to capture complex relationships between features. In this context, the high dimensionality of data often results in poor model performance, with the risk of overfitting. Feature selection, the process of selecting a subset of relevant and non-redundant features, is an essential step to mitigate these issues. However, classical feature selection approaches do not inspect the causal relationship between features and the target variable, which can lead to misleading results in real-world applications. Causal discovery, instead, aims to identify causal relationships between features with observational data. In this paper, we propose a novel methodology at the intersection between feature selection and causal discovery, focusing on time series. We introduce a causal feature selection approach that relies on the forward and backward feature selection procedures and leverages transfer entropy to estimate the causal flow of information. In this context, we provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases. Finally, we present numerical validations on synthetic and real-world regression problems, showing results competitive w.r.t. the considered baselines.

Causal feature selection via transfer entropy

Paolo Bonetti;Alberto Maria Metelli;Marcello Restelli

2024-01-01

Abstract

Machine learning algorithms are designed to capture complex relationships between features. In this context, the high dimensionality of data often results in poor model performance, with the risk of overfitting. Feature selection, the process of selecting a subset of relevant and non-redundant features, is an essential step to mitigate these issues. However, classical feature selection approaches do not inspect the causal relationship between features and the target variable, which can lead to misleading results in real-world applications. Causal discovery, instead, aims to identify causal relationships between features with observational data. In this paper, we propose a novel methodology at the intersection between feature selection and causal discovery, focusing on time series. We introduce a causal feature selection approach that relies on the forward and backward feature selection procedures and leverages transfer entropy to estimate the causal flow of information. In this context, we provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases. Finally, we present numerical validations on synthetic and real-world regression problems, showing results competitive w.r.t. the considered baselines.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2024
			
	Titolo del libro
	
				2024 International Joint Conference on Neural Networks (IJCNN)
			
	ISBN (International Standard Book Number)
	
				979-8-3503-5931-2
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Causal_Feature_Selection_via_Transfer_Entropy.pdf Accesso riservato Descrizione: Paper : Publisher’s version Dimensione 1.1 MB Formato Adobe PDF Visualizza/Apri	1.1 MB	Adobe PDF	Visualizza/Apri
2310.11059v1.pdf accesso aperto Descrizione: arxiv : Pre-Print (o Pre-Refereeing) Dimensione 450.13 kB Formato Adobe PDF Visualizza/Apri	450.13 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1286306

Citazioni

ND

ND

0

social impact