RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

The data-driven culture is based on the importance of data analysis in supporting decision-making. In particular, machine learning technologies and tools are evolving quickly and becoming increasingly popular as an effective means to gain insights from raw data. However, it should be considered that Machine Learning (ML) models often generate uncertain results due mainly to their imperfect and statistical nature. In this paper, we focus on the fact that data preparation techniques can introduce additional uncertainty. Errors, missing values, and inconsistencies are frequently addressed using techniques that correct data using estimates and thus add further uncertainty. Focusing on the specific problem of incomplete data, this paper (i) investigates the effect of imputation techniques on the results' uncertainty, and (ii) identifies the techniques that minimize such an issue.

About the Effects of Data Imputation Techniques on ML Uncertainty

Cappiello C.;Cerutti F.;Sancricca C.;Zanelli R.

2023-01-01

Abstract

The data-driven culture is based on the importance of data analysis in supporting decision-making. In particular, machine learning technologies and tools are evolving quickly and becoming increasingly popular as an effective means to gain insights from raw data. However, it should be considered that Machine Learning (ML) models often generate uncertain results due mainly to their imperfect and statistical nature. In this paper, we focus on the fact that data preparation techniques can introduce additional uncertainty. Errors, missing values, and inconsistencies are frequently addressed using techniques that correct data using estimates and thus add further uncertainty. Focusing on the specific problem of incomplete data, this paper (i) investigates the effect of imputation techniques on the results' uncertainty, and (ii) identifies the techniques that minimize such an issue.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2023
			
	Titolo del libro
	
				Joint Workshops at the 49th International Conference on Very Large Data Bases, VLDBW 2023
			
	Titolo della collana
	
				CEUR WORKSHOP PROCEEDINGS
			
	Parole chiave
	
				Data Imputation
Data Quality
Uncertainty
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1261164

Citazioni

ND

1

ND

social impact