RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Data analysis plays a key role in companies that adopt machine learning models to support their decision-making processes. Among the phases of a machine learning pipeline, data preparation is essential to obtain high-quality data. Data-centric AI shifted the focus of such processes on the quality of data rather than on the machine learning model performance. Users from different application fields face data preparation, and they frequently encounter difficulties in designing effective data preparation pipelines when dealing with a multitude of data quality errors and data quality improvement techniques; this highlights the necessity for approaches to simplify the process of defining an effective data preparation pipeline. The main goal of my Ph.D. project is to design a framework to support users in selecting the data preparation tasks to perform in a machine learning pipeline. Using a knowledge-driven approach, we aim to guide (more and less experienced) users through an interactive process in which recommendations, explanations, and different levels of autonomy can simplify the design of an effective data preparation pipeline.

DIANA: A Knowledge-driven Framework for Data-centric AI

Camilla Sancricca

2024-01-01

Abstract

Data analysis plays a key role in companies that adopt machine learning models to support their decision-making processes. Among the phases of a machine learning pipeline, data preparation is essential to obtain high-quality data. Data-centric AI shifted the focus of such processes on the quality of data rather than on the machine learning model performance. Users from different application fields face data preparation, and they frequently encounter difficulties in designing effective data preparation pipelines when dealing with a multitude of data quality errors and data quality improvement techniques; this highlights the necessity for approaches to simplify the process of defining an effective data preparation pipeline. The main goal of my Ph.D. project is to design a framework to support users in selecting the data preparation tasks to perform in a machine learning pipeline. Using a knowledge-driven approach, we aim to guide (more and less experienced) users through an interactive process in which recommendations, explanations, and different levels of autonomy can simplify the design of an effective data preparation pipeline.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2024
			
	Titolo del libro
	
				Proceedings of the Workshops of the {EDBT/ICDT} 2024 Joint Conferenceco-located with the {EDBT/ICDT} 2024 Joint Conference, Paestum, Italy,March 25, 2024
			
	Titolo della collana
	
				CEUR WORKSHOP PROCEEDINGS
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1295826

Citazioni

ND

0

ND

ND

social impact