While data lakes have emerged as a solution for storing vast amounts of heterogeneous and often unstructured data, responding to the growing need for flexible data storage, integration, and analytics in different domains, the digital transformation of healthcare processes has led to an exponential increase in various types of health records, necessitating efficient data management solutions and making this domain an ideal arena for experimenting data lake efficacy. In data lakes, effective metadata extraction and management are crucial for describing raw data, establishing connections, and ensuring interoperability among datasets ingested into the lake. To address this, we propose a minimum set of metadata tailored for clinical research, which includes relevant information common to significant branches of healthcare. Our metadataset not only streamlines data ingestion processes but also enhances the accessibility and usability of healthcare datasets for research purposes. By standardizing the collected metadata within the clinical research domain, we also facilitate data integration, analysis, and exploration, facilitating comprehensive data description and management within the data lake environment.

A Minimum Metadataset for Data Lakes Supporting Healthcare Research

Piantella D.;Reali P.;Tanca L.
2024-01-01

Abstract

While data lakes have emerged as a solution for storing vast amounts of heterogeneous and often unstructured data, responding to the growing need for flexible data storage, integration, and analytics in different domains, the digital transformation of healthcare processes has led to an exponential increase in various types of health records, necessitating efficient data management solutions and making this domain an ideal arena for experimenting data lake efficacy. In data lakes, effective metadata extraction and management are crucial for describing raw data, establishing connections, and ensuring interoperability among datasets ingested into the lake. To address this, we propose a minimum set of metadata tailored for clinical research, which includes relevant information common to significant branches of healthcare. Our metadataset not only streamlines data ingestion processes but also enhances the accessibility and usability of healthcare datasets for research purposes. By standardizing the collected metadata within the clinical research domain, we also facilitate data integration, analysis, and exploration, facilitating comprehensive data description and management within the data lake environment.
2024
CEUR Workshop Proceedings
data lakes
healthcare
interoperability
medatata
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1272379
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact