Spatial data usually encapsulate semantic characterization of features which carry out important meaning and relations among objects, such as the containment between the extension of a region and of its constituent parts. The GeoUML methodology allows one to bring the gap between the definition of spatial integrity constraints at conceptual level and the realization of validation procedures. In particular, it automatically generates SQL validation queries starting from a conceptual specification and using predefined SQL templates. These queries can be used to check data contained into spatial relational databases, such as PostGIS. However, the quality requirements and the amount of available data are considerably growing making unfeasible the execution of these validation procedures. The use of the map-reduce paradigm can be effectively applied in such context since the same test can be performed in parallel on different data chunks and then partial results can be combined together to obtain the final set of violating objects. Pigeon is a data-flow language defined on top of Spatial Hadoop which provides spatial data types and functions. The aim of this paper is to explore the possibility to extend the GeoUML methodology by automatically producing Pigeon validation procedures starting from a set of predefined Pigeon macros. These scripts can be used in a map-reduce environment in order to make feasible the validation of large datasets.

Towards massive spatial data validation with SpatialHadoop

NEGRI, MAURO;PELAGATTI, GIUSEPPE;
2016-01-01

Abstract

Spatial data usually encapsulate semantic characterization of features which carry out important meaning and relations among objects, such as the containment between the extension of a region and of its constituent parts. The GeoUML methodology allows one to bring the gap between the definition of spatial integrity constraints at conceptual level and the realization of validation procedures. In particular, it automatically generates SQL validation queries starting from a conceptual specification and using predefined SQL templates. These queries can be used to check data contained into spatial relational databases, such as PostGIS. However, the quality requirements and the amount of available data are considerably growing making unfeasible the execution of these validation procedures. The use of the map-reduce paradigm can be effectively applied in such context since the same test can be performed in parallel on different data chunks and then partial results can be combined together to obtain the final set of violating objects. Pigeon is a data-flow language defined on top of Spatial Hadoop which provides spatial data types and functions. The aim of this paper is to explore the possibility to extend the GeoUML methodology by automatically producing Pigeon validation procedures starting from a set of predefined Pigeon macros. These scripts can be used in a map-reduce environment in order to make feasible the validation of large datasets.
2016
Proc. 5th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data (BigSpatial '16)
978-1-4503-4581-1
File in questo prodotto:
File Dimensione Formato  
4_Towards.pdf

Accesso riservato

: Publisher’s version
Dimensione 430.82 kB
Formato Adobe PDF
430.82 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1031892
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? ND
social impact