The collection of data during the routine delivery of care is changing the healthcare sector. Indeed, only from the clinical trial data it is difficult to obtain such a complete picture of the status of a patient as that provided by real-world data. However, the creation of valuable real-word evidence requires the adoption of an appropriate solution to ingest, store, and process the enormous amount of information coming from all the involved, typically heterogeneous data sources. Data lake technologies are depicted as promising solutions for enhancing data management and analysis capabilities in the healthcare domain: we can rely on them to manage the complexity of big data volume and va- riety, providing data analysts with a self-service environment in which advanced analytics can be applied. In this paper we envision the adop- tion of a data lake federation through which organizations could achieve further benefits by sharing data. Exchanging data adds new research challenges related to guaranteeing data reliability and sovereignty. For instance, the collected data should be accurately described in order to document their quality, facilitate their discovery, define security and pri- vacy policies. On the basis of the experience in Health Big Data, we are going to present an architecture for gathering real-world evidence, also identifying the research challenges from an IT perspective

Enabling Real-world Medicine with Data Lake Federation: a research perspective.

Cinzia Cappiello;Marco Gribaudo;Pierluigi Plebani;Mattia Salnitri;Letizia Tanca
2022-01-01

Abstract

The collection of data during the routine delivery of care is changing the healthcare sector. Indeed, only from the clinical trial data it is difficult to obtain such a complete picture of the status of a patient as that provided by real-world data. However, the creation of valuable real-word evidence requires the adoption of an appropriate solution to ingest, store, and process the enormous amount of information coming from all the involved, typically heterogeneous data sources. Data lake technologies are depicted as promising solutions for enhancing data management and analysis capabilities in the healthcare domain: we can rely on them to manage the complexity of big data volume and va- riety, providing data analysts with a self-service environment in which advanced analytics can be applied. In this paper we envision the adop- tion of a data lake federation through which organizations could achieve further benefits by sharing data. Exchanging data adds new research challenges related to guaranteeing data reliability and sovereignty. For instance, the collected data should be accurately described in order to document their quality, facilitate their discovery, define security and pri- vacy policies. On the basis of the experience in Health Big Data, we are going to present an architecture for gathering real-world evidence, also identifying the research challenges from an IT perspective
2022
Springer
978-3-031-23905-2
Data Lake Federation, Data Sharing, Data Management
File in questo prodotto:
File Dimensione Formato  
VLDB_DMAH_Health_Big_Data.pdf

Accesso riservato

Descrizione: Pubblicazione
: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 490.91 kB
Formato Adobe PDF
490.91 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1228652
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact