The data-driven culture is based on the importance of data analysis in supporting decision-making. In particular, machine learning technologies and tools are evolving quickly and becoming increasingly popular as an effective means to gain insights from raw data. However, it should be considered that Machine Learning (ML) models often generate uncertain results due mainly to their imperfect and statistical nature. In this paper, we focus on the fact that data preparation techniques can introduce additional uncertainty. Errors, missing values, and inconsistencies are frequently addressed using techniques that correct data using estimates and thus add further uncertainty. Focusing on the specific problem of incomplete data, this paper (i) investigates the effect of imputation techniques on the results' uncertainty, and (ii) identifies the techniques that minimize such an issue.
About the Effects of Data Imputation Techniques on ML Uncertainty
Cappiello C.;Cerutti F.;Sancricca C.;
2023-01-01
Abstract
The data-driven culture is based on the importance of data analysis in supporting decision-making. In particular, machine learning technologies and tools are evolving quickly and becoming increasingly popular as an effective means to gain insights from raw data. However, it should be considered that Machine Learning (ML) models often generate uncertain results due mainly to their imperfect and statistical nature. In this paper, we focus on the fact that data preparation techniques can introduce additional uncertainty. Errors, missing values, and inconsistencies are frequently addressed using techniques that correct data using estimates and thus add further uncertainty. Focusing on the specific problem of incomplete data, this paper (i) investigates the effect of imputation techniques on the results' uncertainty, and (ii) identifies the techniques that minimize such an issue.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.