Genomic research benefitted from recent extraordinary improvements in DNA sequencing techniques, leading to the production of enormous amounts of datasets that store information such as nucleotide sequences, gene locations/levels of expression, proteins-DNA interactions. As this has now become a big data matter, characterized by an underlying disorganization, there is a strong need for integrative solutions. In this paper, we devote our efforts to the management of genomic data, to be organized and located using experimental studies descriptions. Such documentation, also referred to as metadata, contains fundamental information to understand the content of experimental samples (namely, how the biological material was extracted and processed, in which clinical conditions, with which techniques.) We propose a novel framework to manage metadata of genomic datasets, offering a unified view with respect to a number of heterogeneous data sources (usually big international consortia, but also small research centers) that currently display their metadata in disorganized and very cumbersome formats. The final outcome of this work is a search platform which allows easy location of relevant sources for specific genomic data analysis problems.
Using metadata for locating genomic datasets on a global scale
Bernasconi A.
2019-01-01
Abstract
Genomic research benefitted from recent extraordinary improvements in DNA sequencing techniques, leading to the production of enormous amounts of datasets that store information such as nucleotide sequences, gene locations/levels of expression, proteins-DNA interactions. As this has now become a big data matter, characterized by an underlying disorganization, there is a strong need for integrative solutions. In this paper, we devote our efforts to the management of genomic data, to be organized and located using experimental studies descriptions. Such documentation, also referred to as metadata, contains fundamental information to understand the content of experimental samples (namely, how the biological material was extracted and processed, in which clinical conditions, with which techniques.) We propose a novel framework to manage metadata of genomic datasets, offering a unified view with respect to a number of heterogeneous data sources (usually big international consortia, but also small research centers) that currently display their metadata in disorganized and very cumbersome formats. The final outcome of this work is a search platform which allows easy location of relevant sources for specific genomic data analysis problems.File | Dimensione | Formato | |
---|---|---|---|
paper5.pdf
accesso aperto
:
Publisher’s version
Dimensione
1.64 MB
Formato
Adobe PDF
|
1.64 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.