Data-driven genomic research requires accessing several repositories of genomic datasets, produced by international consortia, which provide open access to extremely valuable and well curated biological content. The associated metadata, describing experimental and biological conditions, are highly heterogeneous; consequently, dataset collection and integration is difficult – it requires data conversions and term matching which needs to be done by humans, with biological expertise. In this paper, we present a method and tools for ontology-driven metadata enrichment. We select few relevant features which are provided by most repositories, and then we comparatively evaluate several search services providing ontological access, eventually associating each feature with the specific ontologies which are most suited to describe them. We also provide an expert validation of the approach. The method and tools are deployed in a large repository of open data, which will be soon available to the research community.

Ontology-Driven Metadata Enrichment for Genomic Datasets

A. Bernasconi;A. Canakoglu;S. Ceri
2018-01-01

Abstract

Data-driven genomic research requires accessing several repositories of genomic datasets, produced by international consortia, which provide open access to extremely valuable and well curated biological content. The associated metadata, describing experimental and biological conditions, are highly heterogeneous; consequently, dataset collection and integration is difficult – it requires data conversions and term matching which needs to be done by humans, with biological expertise. In this paper, we present a method and tools for ontology-driven metadata enrichment. We select few relevant features which are provided by most repositories, and then we comparatively evaluate several search services providing ontological access, eventually associating each feature with the specific ontologies which are most suited to describe them. We also provide an expert validation of the approach. The method and tools are deployed in a large repository of open data, which will be soon available to the research community.
2018
CEUR Workshop Proceedings
Data Integration, Genomic Datasets, Metadata Annotation, Open Data, Bioinformatics
File in questo prodotto:
File Dimensione Formato  
SWAT4HCLS2018_camera-ready.pdf

accesso aperto

Descrizione: Articolo principale
: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 280.94 kB
Formato Adobe PDF
280.94 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1076522
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? ND
social impact