The rise of the Big Data age made traditional solutions for data processing and analysis unsuitable due to the high computational complexity. To address this problem, novel solutions specifically-designed techniques to analyse Big Data have been recently presented. In this path, when such a large amount of data arrives in a streaming manner, a sequential mechanism for the Big Data analysis is required. In this paper we target the modelling of high-dimension datastreams through hidden Markov models (HMMs) and introduce a HMM-based solution, named h-HMM, suitable for datastreams characterized by high dimensions. The proposed h-HMM relies on a suitably-defined clustering algorithm (operating in the space of the datastream dimensions) to create clusters of highly uncorrelated dimensions of the datastreams (as requested by the theory of HMMs) and a two-layer hierarchy of HMMs modelling the datastreams of such clusters. Experimental results on both synthetic and real-world data confirm the advantages of the proposed solution.

Designing HMMs in the age of big data

ALIPPI, CESARE;NTALAMPIRAS, STAVROS;ROVERI, MANUEL
2016

Abstract

The rise of the Big Data age made traditional solutions for data processing and analysis unsuitable due to the high computational complexity. To address this problem, novel solutions specifically-designed techniques to analyse Big Data have been recently presented. In this path, when such a large amount of data arrives in a streaming manner, a sequential mechanism for the Big Data analysis is required. In this paper we target the modelling of high-dimension datastreams through hidden Markov models (HMMs) and introduce a HMM-based solution, named h-HMM, suitable for datastreams characterized by high dimensions. The proposed h-HMM relies on a suitably-defined clustering algorithm (operating in the space of the datastream dimensions) to create clusters of highly uncorrelated dimensions of the datastreams (as requested by the theory of HMMs) and a two-layer hierarchy of HMMs modelling the datastreams of such clusters. Experimental results on both synthetic and real-world data confirm the advantages of the proposed solution.
Advances in Intelligent Systems and Computing
9783319478975
9783319478975
Control and Systems Engineering; Computer Science (all)
File in questo prodotto:
File Dimensione Formato  
35 INNSBIGDATA_2016_paper_20.pdf

Accesso riservato

: Publisher’s version
Dimensione 165.55 kB
Formato Adobe PDF
165.55 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11311/1004496
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact