The human genome complexity is captured by many signals, representing for instance DNA variations, the expression of gene activity, or DNA’s structural rearrangements; a rich set of data types and formats is used to record these signals. Conceptual models can support the description and explanation of the genome’s elaborate structure and behavior. Among others, the Conceptual Schema of the Human Genome (CSG) provides a concept-oriented, top-down representation of the genome behavior, which is independent of data formats. The Genomic Conceptual Model (GCM) provides instead a data-oriented, bottom-up representation, targeting a well-organized, unified description of these formats. In this research, we join the two approaches to achieve PoliViews, a comprehensive model that links (1) a concepts layer, describing genome elements and their conceptual connections, with (2) a data layer, describing datasets derived from genome sequencing with specific technologies. Their dynamic connection is established when specific genomic data types are chosen in the data layer, thereby triggering the selection of a view in the concepts layer. The benefit is mutual: data records can be semantically described by high-level concepts exploiting their links and, in turn, the continuously evolving abstract model can be extended thanks to the input provided by real datasets. PoliViews enables expressing queries that employ a holistic conceptual perspective on the genome, directly translated onto data-oriented terms and organization. Here, we demonstrate the approach by linking two major genomic data types, namely DNA variation and gene expression. For each type, we consider different eminent data sources; we describe their mapping with the corresponding view in the concepts layer, enabling an intra-data-type integration. Then, leveraging on the connections available in the concepts layer, we show how the distinct data types can be interoperated, enabling an inter-data-type integration. The PoliViews approach is shown through several examples of biological interest and can be further extended to any kind of genomic information.
PoliViews: A comprehensive and modular approach to the conceptual modeling of genomic data
Bernasconi, Anna;Ceri, Stefano;
2023-01-01
Abstract
The human genome complexity is captured by many signals, representing for instance DNA variations, the expression of gene activity, or DNA’s structural rearrangements; a rich set of data types and formats is used to record these signals. Conceptual models can support the description and explanation of the genome’s elaborate structure and behavior. Among others, the Conceptual Schema of the Human Genome (CSG) provides a concept-oriented, top-down representation of the genome behavior, which is independent of data formats. The Genomic Conceptual Model (GCM) provides instead a data-oriented, bottom-up representation, targeting a well-organized, unified description of these formats. In this research, we join the two approaches to achieve PoliViews, a comprehensive model that links (1) a concepts layer, describing genome elements and their conceptual connections, with (2) a data layer, describing datasets derived from genome sequencing with specific technologies. Their dynamic connection is established when specific genomic data types are chosen in the data layer, thereby triggering the selection of a view in the concepts layer. The benefit is mutual: data records can be semantically described by high-level concepts exploiting their links and, in turn, the continuously evolving abstract model can be extended thanks to the input provided by real datasets. PoliViews enables expressing queries that employ a holistic conceptual perspective on the genome, directly translated onto data-oriented terms and organization. Here, we demonstrate the approach by linking two major genomic data types, namely DNA variation and gene expression. For each type, we consider different eminent data sources; we describe their mapping with the corresponding view in the concepts layer, enabling an intra-data-type integration. Then, leveraging on the connections available in the concepts layer, we show how the distinct data types can be interoperated, enabling an inter-data-type integration. The PoliViews approach is shown through several examples of biological interest and can be further extended to any kind of genomic information.File | Dimensione | Formato | |
---|---|---|---|
preprint_DKE__Extension_paper_ER2022.pdf
accesso aperto
:
Pre-Print (o Pre-Refereeing)
Dimensione
1.22 MB
Formato
Adobe PDF
|
1.22 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.