Genomic research benefitted from recent extraordinary improvements in DNA sequencing techniques, leading to the production of enormous amounts of datasets that store information such as nucleotide sequences, gene locations/levels of expression, proteins-DNA interactions. As this has now become a big data matter, characterized by an underlying disorganization, there is a strong need for integrative solutions. In this paper, we devote our efforts to the management of genomic data, to be organized and located using experimental studies descriptions. Such documentation, also referred to as metadata, contains fundamental information to understand the content of experimental samples (namely, how the biological material was extracted and processed, in which clinical conditions, with which techniques.) We propose a novel framework to manage metadata of genomic datasets, offering a unified view with respect to a number of heterogeneous data sources (usually big international consortia, but also small research centers) that currently display their metadata in disorganized and very cumbersome formats. The final outcome of this work is a search platform which allows easy location of relevant sources for specific genomic data analysis problems.

Using metadata for locating genomic datasets on a global scale

Bernasconi A.
2019-01-01

Abstract

Genomic research benefitted from recent extraordinary improvements in DNA sequencing techniques, leading to the production of enormous amounts of datasets that store information such as nucleotide sequences, gene locations/levels of expression, proteins-DNA interactions. As this has now become a big data matter, characterized by an underlying disorganization, there is a strong need for integrative solutions. In this paper, we devote our efforts to the management of genomic data, to be organized and located using experimental studies descriptions. Such documentation, also referred to as metadata, contains fundamental information to understand the content of experimental samples (namely, how the biological material was extracted and processed, in which clinical conditions, with which techniques.) We propose a novel framework to manage metadata of genomic datasets, offering a unified view with respect to a number of heterogeneous data sources (usually big international consortia, but also small research centers) that currently display their metadata in disorganized and very cumbersome formats. The final outcome of this work is a search platform which allows easy location of relevant sources for specific genomic data analysis problems.
2019
CEUR Workshop Proceedings, Workshops at CIKM 2018
File in questo prodotto:
File Dimensione Formato  
paper5.pdf

accesso aperto

: Publisher’s version
Dimensione 1.64 MB
Formato Adobe PDF
1.64 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1143897
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact