The human genome complexity is captured by many signals, representing for instance DNA variations, the expression of gene activity, or DNA’s structural rearrangements; a rich set of data types and formats is used to record these signals. Conceptual models can support the description and explanation of the genome’s elaborate structure and behavior. Among others, the Conceptual Schema of the Human Genome (CSG) provides a concept-oriented, top-down representation of the genome behavior, which is independent of data formats. The Genomic Conceptual Model (GCM) provides instead a data-oriented, bottom-up representation, targeting a well-organized, unified description of these formats. In this research, we join the two approaches to achieve PoliViews, a comprehensive model that links (1) a concepts layer, describing genome elements and their conceptual connections, with (2) a data layer, describing datasets derived from genome sequencing with specific technologies. Their dynamic connection is established when specific genomic data types are chosen in the data layer, thereby triggering the selection of a view in the concepts layer. The benefit is mutual: data records can be semantically described by high-level concepts exploiting their links and, in turn, the continuously evolving abstract model can be extended thanks to the input provided by real datasets. PoliViews enables expressing queries that employ a holistic conceptual perspective on the genome, directly translated onto data-oriented terms and organization. Here, we demonstrate the approach by linking two major genomic data types, namely DNA variation and gene expression. For each type, we consider different eminent data sources; we describe their mapping with the corresponding view in the concepts layer, enabling an intra-data-type integration. Then, leveraging on the connections available in the concepts layer, we show how the distinct data types can be interoperated, enabling an inter-data-type integration. The PoliViews approach is shown through several examples of biological interest and can be further extended to any kind of genomic information.

PoliViews: A comprehensive and modular approach to the conceptual modeling of genomic data

Bernasconi, Anna;Ceri, Stefano;
2023-01-01

Abstract

The human genome complexity is captured by many signals, representing for instance DNA variations, the expression of gene activity, or DNA’s structural rearrangements; a rich set of data types and formats is used to record these signals. Conceptual models can support the description and explanation of the genome’s elaborate structure and behavior. Among others, the Conceptual Schema of the Human Genome (CSG) provides a concept-oriented, top-down representation of the genome behavior, which is independent of data formats. The Genomic Conceptual Model (GCM) provides instead a data-oriented, bottom-up representation, targeting a well-organized, unified description of these formats. In this research, we join the two approaches to achieve PoliViews, a comprehensive model that links (1) a concepts layer, describing genome elements and their conceptual connections, with (2) a data layer, describing datasets derived from genome sequencing with specific technologies. Their dynamic connection is established when specific genomic data types are chosen in the data layer, thereby triggering the selection of a view in the concepts layer. The benefit is mutual: data records can be semantically described by high-level concepts exploiting their links and, in turn, the continuously evolving abstract model can be extended thanks to the input provided by real datasets. PoliViews enables expressing queries that employ a holistic conceptual perspective on the genome, directly translated onto data-oriented terms and organization. Here, we demonstrate the approach by linking two major genomic data types, namely DNA variation and gene expression. For each type, we consider different eminent data sources; we describe their mapping with the corresponding view in the concepts layer, enabling an intra-data-type integration. Then, leveraging on the connections available in the concepts layer, we show how the distinct data types can be interoperated, enabling an inter-data-type integration. The PoliViews approach is shown through several examples of biological interest and can be further extended to any kind of genomic information.
2023
Conceptual modeling
Data repositories
Data integration
Biological datasets
Genomics
Scientific databases
File in questo prodotto:
File Dimensione Formato  
preprint_DKE__Extension_paper_ER2022.pdf

accesso aperto

: Pre-Print (o Pre-Refereeing)
Dimensione 1.22 MB
Formato Adobe PDF
1.22 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1242837
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact