The collection of data during the routine delivery of care is changing the healthcare sector. Indeed, only from the clinical trial data it is difficult to obtain such a complete picture of the status of a patient as that provided by real-world data. However, the creation of valuable real-word evidence requires the adoption of an appropriate solution to ingest, store, and process the enormous amount of information coming from all the involved, typically heterogeneous data sources. Data lake technologies are depicted as promising solutions for enhancing data management and analysis capabilities in the healthcare domain: we can rely on them to manage the complexity of big data volume and va- riety, providing data analysts with a self-service environment in which advanced analytics can be applied. In this paper we envision the adop- tion of a data lake federation through which organizations could achieve further benefits by sharing data. Exchanging data adds new research challenges related to guaranteeing data reliability and sovereignty. For instance, the collected data should be accurately described in order to document their quality, facilitate their discovery, define security and pri- vacy policies. On the basis of the experience in Health Big Data, we are going to present an architecture for gathering real-world evidence, also identifying the research challenges from an IT perspective
Enabling Real-world Medicine with Data Lake Federation: a research perspective.
Cinzia Cappiello;Marco Gribaudo;Pierluigi Plebani;Mattia Salnitri;Letizia Tanca
2022-01-01
Abstract
The collection of data during the routine delivery of care is changing the healthcare sector. Indeed, only from the clinical trial data it is difficult to obtain such a complete picture of the status of a patient as that provided by real-world data. However, the creation of valuable real-word evidence requires the adoption of an appropriate solution to ingest, store, and process the enormous amount of information coming from all the involved, typically heterogeneous data sources. Data lake technologies are depicted as promising solutions for enhancing data management and analysis capabilities in the healthcare domain: we can rely on them to manage the complexity of big data volume and va- riety, providing data analysts with a self-service environment in which advanced analytics can be applied. In this paper we envision the adop- tion of a data lake federation through which organizations could achieve further benefits by sharing data. Exchanging data adds new research challenges related to guaranteeing data reliability and sovereignty. For instance, the collected data should be accurately described in order to document their quality, facilitate their discovery, define security and pri- vacy policies. On the basis of the experience in Health Big Data, we are going to present an architecture for gathering real-world evidence, also identifying the research challenges from an IT perspectiveFile | Dimensione | Formato | |
---|---|---|---|
VLDB_DMAH_Health_Big_Data.pdf
Accesso riservato
Descrizione: Pubblicazione
:
Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione
490.91 kB
Formato
Adobe PDF
|
490.91 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.