The rise of the Big Data age made traditional solutions for data processing and analysis unsuitable due to the high computational complexity. To address this problem, novel solutions specifically-designed techniques to analyse Big Data have been recently presented. In this path, when such a large amount of data arrives in a streaming manner, a sequential mechanism for the Big Data analysis is required. In this paper we target the modelling of high-dimension datastreams through hidden Markov models (HMMs) and introduce a HMM-based solution, named h-HMM, suitable for datastreams characterized by high dimensions. The proposed h-HMM relies on a suitably-defined clustering algorithm (operating in the space of the datastream dimensions) to create clusters of highly uncorrelated dimensions of the datastreams (as requested by the theory of HMMs) and a two-layer hierarchy of HMMs modelling the datastreams of such clusters. Experimental results on both synthetic and real-world data confirm the advantages of the proposed solution.

Designing HMMs in the age of big data

ALIPPI, CESARE;NTALAMPIRAS, STAVROS;ROVERI, MANUEL
2016-01-01

Abstract

The rise of the Big Data age made traditional solutions for data processing and analysis unsuitable due to the high computational complexity. To address this problem, novel solutions specifically-designed techniques to analyse Big Data have been recently presented. In this path, when such a large amount of data arrives in a streaming manner, a sequential mechanism for the Big Data analysis is required. In this paper we target the modelling of high-dimension datastreams through hidden Markov models (HMMs) and introduce a HMM-based solution, named h-HMM, suitable for datastreams characterized by high dimensions. The proposed h-HMM relies on a suitably-defined clustering algorithm (operating in the space of the datastream dimensions) to create clusters of highly uncorrelated dimensions of the datastreams (as requested by the theory of HMMs) and a two-layer hierarchy of HMMs modelling the datastreams of such clusters. Experimental results on both synthetic and real-world data confirm the advantages of the proposed solution.
2016
Advances in Intelligent Systems and Computing
9783319478975
9783319478975
Control and Systems Engineering; Computer Science (all)
File in questo prodotto:
File Dimensione Formato  
35 INNSBIGDATA_2016_paper_20.pdf

Accesso riservato

: Publisher’s version
Dimensione 165.55 kB
Formato Adobe PDF
165.55 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1004496
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact