Data management in scientific domains is more important than ever due to the increasing availability of experimental data. Automatically integrating and managing the information would significantly speed up their reuse and, in particular, the development of predictive models for a given domain. However, the diversity, ambiguity, and complexity of experimental data make it hard in practice. In this work, we propose a general approach to overcome these challenges, combining a human-in-the-loop process with a new methodology to understand automatically the semantics of experimental data, which can also be used as a data cleaning procedure. In addition, we focus on assessing the domain coverage of an experimental database using only categorical characteristics of the domain, which is essential for model validation or to understand if and where there is a need to perform additional experiments.

Know Your Experiments: Interpreting Categories of Experimental Data and Their Coverage

E. Ramalli;B. Pernici
2021-01-01

Abstract

Data management in scientific domains is more important than ever due to the increasing availability of experimental data. Automatically integrating and managing the information would significantly speed up their reuse and, in particular, the development of predictive models for a given domain. However, the diversity, ambiguity, and complexity of experimental data make it hard in practice. In this work, we propose a general approach to overcome these challenges, combining a human-in-the-loop process with a new methodology to understand automatically the semantics of experimental data, which can also be used as a data cleaning procedure. In addition, we focus on assessing the domain coverage of an experimental database using only categorical characteristics of the domain, which is essential for model validation or to understand if and where there is a need to perform additional experiments.
2021
Proceedings of the 2nd Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores (SEA-Data 2021) co-located with 47th International Conference on Very Large Data Bases (VLDB 2021)
File in questo prodotto:
File Dimensione Formato  
paper5-2.pdf

accesso aperto

: Publisher’s version
Dimensione 1.09 MB
Formato Adobe PDF
1.09 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1182658
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact