Di4 (1D intervals incremental inverted index) is a multi-resolution, single-dimension indexing framework for efficient, scalable, and extensible computation of genomic interval expressions. The framework has a tri-layer architecture: the semantic layer provides orthogonal and generic means (including the support of user-defined function) of sense-making and higher-lever reasoning from region-based datasets; the logical layer provides building blocks for region calculus and topological relations between intervals; the physical layer abstracts from persistence technology and makes the model adaptable to variety of persistence technologies, spanning from small-scale (e.g., B+tree) to large-scale (e.g., LevelDB). The extensibility of Di4 to application scenarios is shown with an example of comparative evaluation of ChIP-seq and DNase-Seq replicates. Performance of Di4 is benchmarked for small and large scale scenarios under common bioinformatics application scenarios. Di4 is freely available from https://genometric.github.io/Di4.

Next Generation Indexing for Genomic Intervals

Jalili, Vahid;Matteucci, Matteo;Deldjoo, Yashar;Ceri, Stefano
2018-01-01

Abstract

Di4 (1D intervals incremental inverted index) is a multi-resolution, single-dimension indexing framework for efficient, scalable, and extensible computation of genomic interval expressions. The framework has a tri-layer architecture: the semantic layer provides orthogonal and generic means (including the support of user-defined function) of sense-making and higher-lever reasoning from region-based datasets; the logical layer provides building blocks for region calculus and topological relations between intervals; the physical layer abstracts from persistence technology and makes the model adaptable to variety of persistence technologies, spanning from small-scale (e.g., B+tree) to large-scale (e.g., LevelDB). The extensibility of Di4 to application scenarios is shown with an example of comparative evaluation of ChIP-seq and DNase-Seq replicates. Performance of Di4 is benchmarked for small and large scale scenarios under common bioinformatics application scenarios. Di4 is freely available from https://genometric.github.io/Di4.
2018
Bioinformatics; Calculus; DNA; efficient query processing; genomic data management; Genomics; Index structures; Indexing; Tools; Information Systems; Computer Science Applications1707 Computer Vision and Pattern Recognition; Computational Theory and Mathematics
File in questo prodotto:
File Dimensione Formato  
Di4TKDE.pdf

Accesso riservato

: Publisher’s version
Dimensione 9.11 MB
Formato Adobe PDF
9.11 MB Adobe PDF   Visualizza/Apri
11311-1078130_Ceri.pdf

accesso aperto

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 3.62 MB
Formato Adobe PDF
3.62 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1078130
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 6
social impact