The first human genome has been sequenced at the turn of the year 2000. Since then, modern biology has made great progresses, also thanks to the introduction of Next-generation sequencing in the mid-2000s. The growing availability of genomic data led to the birth of tertiary analysis, concerning sense-making and extraction of useful biological information. To deal with data heterogeneity, in the last decade many tools have been introduced to achieve genomic data integration: among them, the Genomic Conceptual Model (GCM) and the META-BASE architecture. The latter one allows to map data from many projects into the GCM through an integration pipeline. In this work, we proposed an extension of the GCM to integrate two additional sources into the META-BASE architecture, namely: GWAS Catalog (curated by the NHGRI and EBI institutes) and FinnGen (curated by the University of Helsinki). These two sources host Genome-Wide Association Studies (GWAS), useful for explaining the connection between genome variations of single nucleotides and particular traits. They are organized according to different data models but share the same data semantics. As a result of our integration efforts, we enable the interoperable use and querying of GWAS datasets with several other genomic datasets (including TCGA, ENCODE, Roadmap Epigenomics, 1000 Genomes Project, and GENCODE).

Extension of the Genomic Conceptual Model to Integrate Genome-Wide Association Studies

Comolli F.
2021-01-01

Abstract

The first human genome has been sequenced at the turn of the year 2000. Since then, modern biology has made great progresses, also thanks to the introduction of Next-generation sequencing in the mid-2000s. The growing availability of genomic data led to the birth of tertiary analysis, concerning sense-making and extraction of useful biological information. To deal with data heterogeneity, in the last decade many tools have been introduced to achieve genomic data integration: among them, the Genomic Conceptual Model (GCM) and the META-BASE architecture. The latter one allows to map data from many projects into the GCM through an integration pipeline. In this work, we proposed an extension of the GCM to integrate two additional sources into the META-BASE architecture, namely: GWAS Catalog (curated by the NHGRI and EBI institutes) and FinnGen (curated by the University of Helsinki). These two sources host Genome-Wide Association Studies (GWAS), useful for explaining the connection between genome variations of single nucleotides and particular traits. They are organized according to different data models but share the same data semantics. As a result of our integration efforts, we enable the interoperable use and querying of GWAS datasets with several other genomic datasets (including TCGA, ENCODE, Roadmap Epigenomics, 1000 Genomes Project, and GENCODE).
2021
Advances in Conceptual Modeling. ER 2021.
978-3-030-88357-7
978-3-030-88358-4
Bioinformatics
Data integration
Genomic datasets
GWAS
Multiomics studies
File in questo prodotto:
File Dimensione Formato  
CMLS2021_GWAS.pdf

accesso aperto

Descrizione: Postprint
: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 1.05 MB
Formato Adobe PDF
1.05 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1189441
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact