RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

The S-PIC4CHU project deals with the crucial issue of data preparation for Data Science and Machine Learning, and aims to offer new models and techniques for fighting inaccuracy, noise, uncertainty, bias, and incompleteness of data. While, at the core, the project embraces a semantics-based approach, the proposed data preparation pipeline includes data cleaning -also from the ethical viewpoint-, transformation, reduction as well as deduplication, error detection, missing value imputation, and space transformations for multimedia data. This paper illustrates the advancements on all these fronts, achieved during the first months of work on the project, and sets out the forthcoming actionable objectives.

S-PIC4CHU: Semantics-Enriched Techniques for Data Preparation in Data Science

Gianvincenzo Alfano;Ilaria Bartolini;Diego Calvanese;Paolo Ciaccia;Sergio Greco;Davide Lanti;Pasquale Leonardo Lazzaro;Emilia Lenzi;Davide Martinenghi;Cristian Molinaro;Marco Patella;Letizia Tanca;Riccardo Torlone;Irina Trubitsyna

2025-01-01

Abstract

The S-PIC4CHU project deals with the crucial issue of data preparation for Data Science and Machine Learning, and aims to offer new models and techniques for fighting inaccuracy, noise, uncertainty, bias, and incompleteness of data. While, at the core, the project embraces a semantics-based approach, the proposed data preparation pipeline includes data cleaning -also from the ethical viewpoint-, transformation, reduction as well as deduplication, error detection, missing value imputation, and space transformations for multimedia data. This paper illustrates the advancements on all these fronts, achieved during the first months of work on the project, and sets out the forthcoming actionable objectives.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del libro
	
				Proceedings of the 4th Italian Conference on Big Data and Data Science (ITADATA 2025), Turin, Italy, September 9-11, 2025
			
	Titolo della collana
	
				CEUR WORKSHOP PROCEEDINGS
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
itaData2025-AlfanoEtAl.pdf accesso aperto Dimensione 287.99 kB Formato Adobe PDF Visualizza/Apri	287.99 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1308426

Citazioni

ND

0

ND

ND

social impact