Data preparation is the process of normalizing, cleaning, transforming, and combining data prior to processing or analysis. It is crucial for obtaining valuable results from data analysis. However, designing the most effective data preparation pipeline is often one of the biggest challenges for data analysts, consuming up to 70–80% of their time. The work illustrated in this paper is the first step toward designing a framework that simplifies the selection and validation of data preparation tasks. It proposes an environment with diverse levels of assistance and autonomy, accommodating varying data analysts’ skills and expertise. The requirements for the design of this new framework were elicited through in-depth interviews and think-aloud sessions involving a sample of data analysts, which highlighted understandability, explainability, and continuous learning as fundamental factors. The paper discusses alternatives to enhance these factors, also considering strategies that adopt Large Language Models.

Improving Understandability and Control in Data Preparation: A Human-Centered Approach

Pucci E.;Sancricca C.;Andolina S.;Cappiello C.;Matera M.;
2024-01-01

Abstract

Data preparation is the process of normalizing, cleaning, transforming, and combining data prior to processing or analysis. It is crucial for obtaining valuable results from data analysis. However, designing the most effective data preparation pipeline is often one of the biggest challenges for data analysts, consuming up to 70–80% of their time. The work illustrated in this paper is the first step toward designing a framework that simplifies the selection and validation of data preparation tasks. It proposes an environment with diverse levels of assistance and autonomy, accommodating varying data analysts’ skills and expertise. The requirements for the design of this new framework were elicited through in-depth interviews and think-aloud sessions involving a sample of data analysts, which highlighted understandability, explainability, and continuous learning as fundamental factors. The paper discusses alternatives to enhance these factors, also considering strategies that adopt Large Language Models.
2024
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
9783031610561
9783031610578
Data preparation
Explainability
Human-centered Design
File in questo prodotto:
File Dimensione Formato  
Improving Understandability and Control - A Human-Centered Approach.pdf

Accesso riservato

: Publisher’s version
Dimensione 2.16 MB
Formato Adobe PDF
2.16 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1276049
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact