In medicine, the digitization of healthcare processes and health services is generating an incredible amount of medical data. However, the huge data volume and variety of formats significantly impact the efficient sharing of data collected across different hospitals. This could compromise the quality of multicentric studies and hamper the potentiality of modern medical research through AI-based systems and machine learning analysis. In this context, being able to extract and manage good-quality metadata is paramount, since, especially when dealing with heterogeneous and unstructured datasets, metadata provides valuable ready-to-use information regarding the dataset without the need to directly analyze its content. Several data models exist that are specific for storing and conveniently organizing clinical metadata, such as the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), providing a flexible solution for multiple types of healthcare data. Furthermore, being compliant with the EU AI Act is a necessary requirement for medical AI-systems, thus metadata can also support ethical data science. The role of metadata in clinical contexts has been studied and analyzed in the Health Big Data project, whose goal is involving 51 Italian research hospitals (IRCCS) to maximize the interoperability of healthcare datasets and enhance clinical research. In this discussion paper, we describe how effective management of metadata in clinical datasets is crucial for ensuring data usability, harmonization, and ethics in AI-driven healthcare applications.
Metadata as a Key Driver in Healthcare Data Analysis
Chiara Criscuolo;Davide Piantella;Pierluigi Reali;Maria Gabriella Signorini;Letizia Tanca
2025-01-01
Abstract
In medicine, the digitization of healthcare processes and health services is generating an incredible amount of medical data. However, the huge data volume and variety of formats significantly impact the efficient sharing of data collected across different hospitals. This could compromise the quality of multicentric studies and hamper the potentiality of modern medical research through AI-based systems and machine learning analysis. In this context, being able to extract and manage good-quality metadata is paramount, since, especially when dealing with heterogeneous and unstructured datasets, metadata provides valuable ready-to-use information regarding the dataset without the need to directly analyze its content. Several data models exist that are specific for storing and conveniently organizing clinical metadata, such as the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), providing a flexible solution for multiple types of healthcare data. Furthermore, being compliant with the EU AI Act is a necessary requirement for medical AI-systems, thus metadata can also support ethical data science. The role of metadata in clinical contexts has been studied and analyzed in the Health Big Data project, whose goal is involving 51 Italian research hospitals (IRCCS) to maximize the interoperability of healthcare datasets and enhance clinical research. In this discussion paper, we describe how effective management of metadata in clinical datasets is crucial for ensuring data usability, harmonization, and ethics in AI-driven healthcare applications.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


