RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

During recent years, the field of emotional content analysis of speech signals has been gaining a lot of attention and several frameworks have been constructed by different researchers for recognition of human emotions in spoken utterances. This paper describes a series of exhaustive experiments which demonstrate the feasibility of recognizing human emotional states via integrating low level descriptors. Our aim is to investigate three different methodologies for integrating subsequent feature values. More specifically, we used the following methods: 1) short-term statistics, 2) spectral moments, and 3) autoregressive models. Additionally, we employed a newly introduced group of parameters which is based on the wavelet decomposition. These are compared with a baseline set comprised of descriptors which are usually used for the specific task. Subsequently, we experimented on fusing these sets on the feature and log-likelihood levels. The classification step is based on hidden Markov models, while several algorithms which can handle redundant information were used during fusion. We report results on the well-known and freely available database BERLIN using data of six emotional states. Our experiments show the importance of including information which is captured by the set based on multiresolution analysis and the efficacy of merging subsequent feature values. © 2010-2012 IEEE.

Modeling the temporal evolution of acoustic parameters for speech emotion recognition

NTALAMPIRAS, STAVROS;Fakotakis, Nikos

2012-01-01

Abstract

During recent years, the field of emotional content analysis of speech signals has been gaining a lot of attention and several frameworks have been constructed by different researchers for recognition of human emotions in spoken utterances. This paper describes a series of exhaustive experiments which demonstrate the feasibility of recognizing human emotional states via integrating low level descriptors. Our aim is to investigate three different methodologies for integrating subsequent feature values. More specifically, we used the following methods: 1) short-term statistics, 2) spectral moments, and 3) autoregressive models. Additionally, we employed a newly introduced group of parameters which is based on the wavelet decomposition. These are compared with a baseline set comprised of descriptors which are usually used for the specific task. Subsequently, we experimented on fusing these sets on the feature and log-likelihood levels. The classification step is based on hidden Markov models, while several algorithms which can handle redundant information were used during fusion. We report results on the well-known and freely available database BERLIN using data of six emotional states. Our experiments show the importance of including information which is captured by the set based on multiresolution analysis and the efficacy of merging subsequent feature values. © 2010-2012 IEEE.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2012
			
	Titolo della rivista
	
				IEEE TRANSACTIONS ON AFFECTIVE COMPUTING
			
	Parole chiave
	
				Acoustic signal processing; autoregressive models; speech emotion recognition; temporal feature integration; wavelet decomposition; Software; Human-Computer Interaction
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
06 IEEE TAFC 06035665.pdf Accesso riservato : Publisher’s version Dimensione 1.05 MB Formato Adobe PDF Visualizza/Apri	1.05 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1004294

Citazioni

ND

86

70

social impact