Workflow Characterization of a Big Data System Model for Healthcare Through Multiformalism

Covioli, T.; Dolci, T.; Azzalini, F.; Piantella, D.; Barbierato, E.; Gribaudo, M.

doi:10.1007/978-3-031-43185-2_19

The development of technologies such as cloud computing, IoT, and social networks caused the amount of data generated daily to grow at an incredible rate, giving birth to the trend of Big Data. Big data has emerged in the healthcare field, thanks to the introduction of new tools producing massive amounts of structured and unstructured data. For this reason, medical institutions are moving towards a data-based healthcare, with the goal of leveraging this data to support clinical decision-making through suitable information systems. This comes with the need to evaluate their performance. One of the techniques commonly used is modeling, which consists in performing an evaluation of a model of the system under analysis, without actually implementing it. However, to make an adequate performance assessment of Big Data systems, we need a diversity of volumes and speeds that, due to the sensitivity of data concerning healthcare, is not available. While in other fields this problem is usually solved through the use of synthetic data generators, in healthcare these are few and not specialized in performance evaluation. Therefore, this work focuses on the creation of a synthetic data generator for evaluating the performance of a Big Data system model for healthcare. The dataset used as a reference for creating the generator is MIMIC-III, which contains the digital health records of thousands of patients collected over a time span of multiple years. First, we perform an analysis of the dataset, adopting multiple distribution fitting techniques (e.g., phase-type fitting) to model the temporal distribution of the data. Then, we develop a generator structured as a multi-module library to allow the customization of each component, specifically we propose a multiformalism model to reproduce the patient behavior inside the hospital. Finally, we test the generator by evaluating the performance in different scenarios. Through these experiments, we show the granular control that the generator offers over the synthetic data produced, and the simplicity with which it can be adapted to different uses.

Workflow Characterization of a Big Data System Model for Healthcare Through Multiformalism

Covioli T.;Dolci T.;Azzalini F.;Piantella D.;Barbierato E.;Gribaudo M.

2023-01-01

Abstract

The development of technologies such as cloud computing, IoT, and social networks caused the amount of data generated daily to grow at an incredible rate, giving birth to the trend of Big Data. Big data has emerged in the healthcare field, thanks to the introduction of new tools producing massive amounts of structured and unstructured data. For this reason, medical institutions are moving towards a data-based healthcare, with the goal of leveraging this data to support clinical decision-making through suitable information systems. This comes with the need to evaluate their performance. One of the techniques commonly used is modeling, which consists in performing an evaluation of a model of the system under analysis, without actually implementing it. However, to make an adequate performance assessment of Big Data systems, we need a diversity of volumes and speeds that, due to the sensitivity of data concerning healthcare, is not available. While in other fields this problem is usually solved through the use of synthetic data generators, in healthcare these are few and not specialized in performance evaluation. Therefore, this work focuses on the creation of a synthetic data generator for evaluating the performance of a Big Data system model for healthcare. The dataset used as a reference for creating the generator is MIMIC-III, which contains the digital health records of thousands of patients collected over a time span of multiple years. First, we perform an analysis of the dataset, adopting multiple distribution fitting techniques (e.g., phase-type fitting) to model the temporal distribution of the data. Then, we develop a generator structured as a multi-module library to allow the customization of each component, specifically we propose a multiformalism model to reproduce the patient behavior inside the hospital. Finally, we test the generator by evaluating the performance in different scenarios. Through these experiments, we show the granular control that the generator offers over the synthetic data produced, and the simplicity with which it can be adapted to different uses.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2023
			
	Titolo del libro
	
				COMPUTER PERFORMANCE ENGINEERING AND STOCHASTIC MODELLING, EPEW 2023, ASMTA 2023
			
	Titolo della collana
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	ISBN (International Standard Book Number)
	
				978-3-031-43184-5
978-3-031-43185-2
			
	Parole chiave
	
				Big Data
healthcare data
performance evaluation
synthetic data generation
			
	Appare nelle tipologie:
	
				02.1 Contributo in Volume

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1257604

Citazioni

ND

2

1

ND

RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Workflow Characterization of a Big Data System Model for Healthcare Through Multiformalism

Covioli T.;Dolci T.;Azzalini F.;Piantella D.;Barbierato E.;Gribaudo M.

2023-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Workflow Characterization of a Big Data System Model for Healthcare Through Multiformalism

Covioli T.;Dolci T.;Azzalini F.;Piantella D.;Barbierato E.;Gribaudo M.

2023-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)